summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2021-07-20x86/signal: Detect and prevent an alternate signal stack overflowChang S. Bae
[ Upstream commit 2beb4a53fc3f1081cedc1c1a198c7f56cc4fc60c ] The kernel pushes context on to the userspace stack to prepare for the user's signal handler. When the user has supplied an alternate signal stack, via sigaltstack(2), it is easy for the kernel to verify that the stack size is sufficient for the current hardware context. Check if writing the hardware context to the alternate stack will exceed it's size. If yes, then instead of corrupting user-data and proceeding with the original signal handler, an immediate SIGSEGV signal is delivered. Refactor the stack pointer check code from on_sig_stack() and use the new helper. While the kernel allows new source code to discover and use a sufficient alternate signal stack size, this check is still necessary to protect binaries with insufficient alternate signal stack size from data corruption. Fixes: c2bc11f10a39 ("x86, AVX-512: Enable AVX-512 States Context Switch") Reported-by: Florian Weimer <fweimer@redhat.com> Suggested-by: Jann Horn <jannh@google.com> Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Len Brown <len.brown@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20210518200320.17239-6-chang.seok.bae@intel.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=153531 Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20NFS: nfs_find_open_context() may only select open filesTrond Myklebust
[ Upstream commit e97bc66377bca097e1f3349ca18ca17f202ff659 ] If a file has already been closed, then it should not be selected to support further I/O. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> [Trond: Fix an invalid pointer deref reported by Colin Ian King] Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20kcov: add __no_sanitize_coverage to fix noinstr for all architecturesMarco Elver
[ Upstream commit 540540d06e9d9b3769b46d88def90f7e7c002322 ] Until now no compiler supported an attribute to disable coverage instrumentation as used by KCOV. To work around this limitation on x86, noinstr functions have their coverage instrumentation turned into nops by objtool. However, this solution doesn't scale automatically to other architectures, such as arm64, which are migrating to use the generic entry code. Clang [1] and GCC [2] have added support for the attribute recently. [1] https://github.com/llvm/llvm-project/commit/280333021e9550d80f5c1152a34e33e81df1e178 [2] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=cec4d4a6782c9bd8d071839c50a239c49caca689 The changes will appear in Clang 13 and GCC 12. Add __no_sanitize_coverage for both compilers, and add it to noinstr. Note: In the Clang case, __has_feature(coverage_sanitizer) is only true if the feature is enabled, and therefore we do not require an additional defined(CONFIG_KCOV) (like in the GCC case where __has_attribute(..) is always true) to avoid adding redundant attributes to functions if KCOV is off. That being said, compilers that support the attribute will not generate errors/warnings if the attribute is redundantly used; however, where possible let's avoid it as it reduces preprocessed code size and associated compile-time overheads. [elver@google.com: Implement __has_feature(coverage_sanitizer) in Clang] Link: https://lkml.kernel.org/r/20210527162655.3246381-1-elver@google.com [elver@google.com: add comment explaining __has_feature() in Clang] Link: https://lkml.kernel.org/r/20210527194448.3470080-1-elver@google.com Link: https://lkml.kernel.org/r/20210525175819.699786-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20scsi: iscsi: Fix conn use after free during resetsMike Christie
[ Upstream commit ec29d0ac29be366450a7faffbcf8cba3a6a3b506 ] If we haven't done a unbind target call we can race where iscsi_conn_teardown wakes up the EH thread and then frees the conn while those threads are still accessing the conn ehwait. We can only do one TMF per session so this just moves the TMF fields from the conn to the session. We can then rely on the iscsi_session_teardown->iscsi_remove_session->__iscsi_unbind_session call to remove the target and it's devices, and know after that point there is no device or scsi-ml callout trying to access the session. Link: https://lore.kernel.org/r/20210525181821.7617-14-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20scsi: iscsi: Add iscsi_cls_conn refcount helpersMike Christie
[ Upstream commit b1d19e8c92cfb0ded180ef3376c20e130414e067 ] There are a couple places where we could free the iscsi_cls_conn while it's still in use. This adds some helpers to get/put a refcount on the struct and converts an exiting user. Subsequent commits will then use the helpers to fix 2 bugs in the eh code. Link: https://lore.kernel.org/r/20210525181821.7617-11-michael.christie@oracle.com Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20rcu: Reject RCU_LOCKDEP_WARN() false positivesPaul E. McKenney
[ Upstream commit 3066820034b5dd4e89bd74a7739c51c2d6f5e554 ] If another lockdep report runs concurrently with an RCU lockdep report from RCU_LOCKDEP_WARN(), the following sequence of events can occur: 1. debug_lockdep_rcu_enabled() sees that lockdep is enabled when called from (say) synchronize_rcu(). 2. Lockdep is disabled by a concurrent lockdep report. 3. debug_lockdep_rcu_enabled() evaluates its lockdep-expression argument, for example, lock_is_held(&rcu_bh_lock_map). 4. Because lockdep is now disabled, lock_is_held() plays it safe and returns the constant 1. 5. But in this case, the constant 1 is not safe, because invoking synchronize_rcu() under rcu_read_lock_bh() is disallowed. 6. debug_lockdep_rcu_enabled() wrongly invokes lockdep_rcu_suspicious(), resulting in a false-positive splat. This commit therefore changes RCU_LOCKDEP_WARN() to check debug_lockdep_rcu_enabled() after checking the lockdep expression, so that any "safe" returns from lock_is_held() are rejected by debug_lockdep_rcu_enabled(). This requires memory ordering, which is supplied by READ_ONCE(debug_locks). The resulting volatile accesses prevent the compiler from reordering and the fact that only one variable is being accessed prevents the underlying hardware from reordering. The combination works for IA64, which can reorder reads to the same location, but this is defeated by the volatile accesses, which compile to load instructions that provide ordering. Reported-by: syzbot+dde0cc33951735441301@syzkaller.appspotmail.com Reported-by: Matthew Wilcox <willy@infradead.org> Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com Reported-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-19media: subdev: disallow ioctl for saa6588/davinciArnd Bergmann
commit 0a7790be182d32b9b332a37cb4206e24fe94b728 upstream. The saa6588_ioctl() function expects to get called from other kernel functions with a 'saa6588_command' pointer, but I found nothing stops it from getting called from user space instead, which seems rather dangerous. The same thing happens in the davinci vpbe driver with its VENC_GET_FLD command. As a quick fix, add a separate .command() callback pointer for this driver and change the two callers over to that. This change can easily get backported to stable kernels if necessary, but since there are only two drivers, we may want to eventually replace this with a set of more specialized callbacks in the long run. Fixes: c3fda7f835b0 ("V4L/DVB (10537): saa6588: convert to v4l2_subdev.") Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-19rq-qos: fix missed wake-ups in rq_qos_throttle try twoJan Kara
commit 11c7aa0ddea8611007768d3e6b58d45dc60a19e1 upstream. Commit 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle") tried to fix a problem that a process could be sleeping in rq_qos_wait() without anyone to wake it up. However the fix is not complete and the following can still happen: CPU1 (waiter1) CPU2 (waiter2) CPU3 (waker) rq_qos_wait() rq_qos_wait() acquire_inflight_cb() -> fails acquire_inflight_cb() -> fails completes IOs, inflight decreased prepare_to_wait_exclusive() prepare_to_wait_exclusive() has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers has_sleeper = !wq_has_single_sleeper() -> true io_schedule() io_schedule() Deadlock as now there's nobody to wakeup the two waiters. The logic automatically blocking when there are already sleepers is really subtle and the only way to make it work reliably is that we check whether there are some waiters in the queue when adding ourselves there. That way, we are guaranteed that at least the first process to enter the wait queue will recheck the waiting condition before going to sleep and thus guarantee forward progress. Fixes: 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210607112613.25344-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-19power: supply: ab8500: Fix an old bugLinus Walleij
commit f1c74a6c07e76fcb31a4bcc1f437c4361a2674ce upstream. Trying to get the AB8500 charging driver working I ran into a bit of bitrot: we haven't used the driver for a while so errors in refactorings won't be noticed. This one is pretty self evident: use argument to the macro or we end up with a random pointer to something else. Cc: stable@vger.kernel.org Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Marcus Cooper <codekipper@gmail.com> Fixes: 297d716f6260 ("power_supply: Change ownership from driver to core") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-19scsi: iscsi: Fix race condition between login and sync threadGulam Mohamed
commit 9e67600ed6b8565da4b85698ec659b5879a6c1c6 upstream. A kernel panic was observed due to a timing issue between the sync thread and the initiator processing a login response from the target. The session reopen can be invoked both from the session sync thread when iscsid restarts and from iscsid through the error handler. Before the initiator receives the response to a login, another reopen request can be sent from the error handler/sync session. When the initial login response is subsequently processed, the connection has been closed and the socket has been released. To fix this a new connection state, ISCSI_CONN_BOUND, is added: - Set the connection state value to ISCSI_CONN_DOWN upon iscsi_if_ep_disconnect() and iscsi_if_stop_conn() - Set the connection state to the newly created value ISCSI_CONN_BOUND after bind connection (transport->bind_conn()) - In iscsi_set_param(), return -ENOTCONN if the connection state is not either ISCSI_CONN_BOUND or ISCSI_CONN_UP Link: https://lore.kernel.org/r/20210325093248.284678-1-gulam.mohamed@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-19sctp: validate from_addr_param returnMarcelo Ricardo Leitner
[ Upstream commit 0c5dc070ff3d6246d22ddd931f23a6266249e3db ] Ilja reported that, simply putting it, nothing was validating that from_addr_param functions were operating on initialized memory. That is, the parameter itself was being validated by sctp_walk_params, but it doesn't check for types and their specific sizes and it could be a 0-length one, causing from_addr_param to potentially work over the next parameter or even uninitialized memory. The fix here is to, in all calls to from_addr_param, check if enough space is there for the wanted IP address type. Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-19flow_offload: action should not be NULL when it is referencedgushengxian
[ Upstream commit 9ea3e52c5bc8bb4a084938dc1e3160643438927a ] "action" should not be NULL when it is referenced. Signed-off-by: gushengxian <13145886936@163.com> Signed-off-by: gushengxian <gushengxian@yulong.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-19net: fix mistake path for netdev_features_stringsJian Shen
[ Upstream commit 2d8ea148e553e1dd4e80a87741abdfb229e2b323 ] Th_strings arrays netdev_features_strings, tunable_strings, and phy_tunable_strings has been moved to file net/ethtool/common.c. So fixes the comment. Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-19net: mdio: provide shim implementation of devm_of_mdiobus_registerVladimir Oltean
[ Upstream commit 86544c3de6a2185409c5a3d02f674ea223a14217 ] Similar to the way in which of_mdiobus_register() has a fallback to the non-DT based mdiobus_register() when CONFIG_OF is not set, we can create a shim for the device-managed devm_of_mdiobus_register() which calls devm_mdiobus_register() and discards the struct device_node *. In particular, this solves a build issue with the qca8k DSA driver which uses devm_of_mdiobus_register and can be compiled without CONFIG_OF. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14block: return the correct bvec when checking for gapsLong Li
commit c9c9762d4d44dcb1b2ba90cfb4122dc11ceebf31 upstream. After commit 07173c3ec276 ("block: enable multipage bvecs"), a bvec can have multiple pages. But bio_will_gap() still assumes one page bvec while checking for merging. If the pages in the bvec go across the seg_boundary_mask, this check for merging can potentially succeed if only the 1st page is tested, and can fail if all the pages are tested. Later, when SCSI builds the SG list the same check for merging is done in __blk_segment_map_sg_merge() with all the pages in the bvec tested. This time the check may fail if the pages in bvec go across the seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those BIOs were merged). If this check fails, we end up with a broken SG list for drivers assuming the SG list not having offsets in intermediate pages. This results in incorrect pages written to the disk. Fix this by returning the multi-page bvec when testing gaps for merging. Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 07173c3ec276 ("block: enable multipage bvecs") Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/1623094445-22332-1-git-send-email-longli@linuxonhyperv.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14scsi: fc: Correct RHBA attributes lengthJaved Hasan
commit 40445fd2c9fa427297acdfcc2c573ff10493f209 upstream. As per the FC-GS-5 specification, attribute lengths of node_name and manufacturer should in range of "4 to 64 Bytes" only. Link: https://lore.kernel.org/r/20210603101404.7841-2-jhasan@marvell.com Fixes: e721eb0616f6 ("scsi: scsi_transport_fc: Match HBA Attribute Length with HBAAPI V2.0 definitions") CC: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-07-14include/linux/huge_mm.h: remove extern keywordRalph Campbell
[ Upstream commit ebfe1b8f6ea5d83d8c1aa18ddd8ede432a7414e7 ] The external function definitions don't need the "extern" keyword. Remove them so future changes don't copy the function definition style. Link: https://lkml.kernel.org/r/20201106235135.32109-1-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14mm/huge_memory.c: add missing read-only THP checking in ↵Miaohe Lin
transparent_hugepage_enabled() [ Upstream commit e6be37b2e7bddfe0c76585ee7c7eee5acc8efeab ] Since commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS"), read-only THP file mapping is supported. But it forgot to add checking for it in transparent_hugepage_enabled(). To fix it, we add checking for read-only THP file mapping and also introduce helper transhuge_vma_enabled() to check whether thp is enabled for specified vma to reduce duplicated code. We rename transparent_hugepage_enabled to transparent_hugepage_active to make the code easier to follow as suggested by David Hildenbrand. [linmiaohe@huawei.com: define transhuge_vma_enabled next to transhuge_vma_suitable] Link: https://lkml.kernel.org/r/20210514093007.4117906-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20210511134857.1581273-4-linmiaohe@huawei.com Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Rik van Riel <riel@surriel.com> Cc: Song Liu <songliubraving@fb.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14mm/huge_memory.c: remove dedicated macro HPAGE_CACHE_INDEX_MASKMiaohe Lin
[ Upstream commit b2bd53f18bb7f7cfc91b3bb527d7809376700a8e ] Patch series "Cleanup and fixup for huge_memory:, v3. This series contains cleanups to remove dedicated macro and remove unnecessary tlb_remove_page_size() for huge zero pmd. Also this adds missing read-only THP checking for transparent_hugepage_enabled() and avoids discarding hugepage if other processes are mapping it. More details can be found in the respective changelogs. Thi patch (of 5): Rewrite the pgoff checking logic to remove macro HPAGE_CACHE_INDEX_MASK which is only used here to simplify the code. Link: https://lkml.kernel.org/r/20210511134857.1581273-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20210511134857.1581273-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Song Liu <songliubraving@fb.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14mm/pmem: avoid inserting hugepage PTE entry with fsdax if hugepage support ↵Aneesh Kumar K.V
is disabled [ Upstream commit bae84953815793f68ddd8edeadd3f4e32676a2c8 ] Differentiate between hardware not supporting hugepages and user disabling THP via 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' For the devdax namespace, the kernel handles the above via the supported_alignment attribute and failing to initialize the namespace if the namespace align value is not supported on the platform. For the fsdax namespace, the kernel will continue to initialize the namespace. This can result in the kernel creating a huge pte entry even though the hardware don't support the same. We do want hugepage support with pmem even if the end-user disabled THP via sysfs file (/sys/kernel/mm/transparent_hugepage/enabled). Hence differentiate between hardware/firmware lacking support vs user-controlled disable of THP and prevent a huge fault if the hardware lacks hugepage support. Link: https://lkml.kernel.org/r/20210205023956.417587-1-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14iio: cros_ec_sensors: Fix alignment of buffer in ↵Jonathan Cameron
iio_push_to_buffers_with_timestamp() [ Upstream commit 8dea228b174ac9637b567e5ef54f4c40db4b3c41 ] The samples buffer is passed to iio_push_to_buffers_with_timestamp() which requires a buffer aligned to 8 bytes as it is assumed that the timestamp will be naturally aligned if present. Fixes tag is inaccurate but prior to that likely manual backporting needed (for anything before 4.18) Earlier than that the include file to fix is drivers/iio/common/cros_ec_sensors/cros_ec_sensors_core.h: commit 974e6f02e27 ("iio: cros_ec_sensors_core: Add common functions for the ChromeOS EC Sensor Hub.") present since kernel stable 4.10. (Thanks to Gwendal for tracking this down) Fixes: 5a0b8cb46624c ("iio: cros_ec: Move cros_ec_sensors_core.h in /include") Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gwendal Grignou <gwendal@chromium.org Link: https://lore.kernel.org/r/20210501171352.512953-7-jic23@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14net: lwtunnel: handle MTU calculation in forwadingVadim Fedorenko
[ Upstream commit fade56410c22cacafb1be9f911a0afd3701d8366 ] Commit 14972cbd34ff ("net: lwtunnel: Handle fragmentation") moved fragmentation logic away from lwtunnel by carry encap headroom and use it in output MTU calculation. But the forwarding part was not covered and created difference in MTU for output and forwarding and further to silent drops on ipv4 forwarding path. Fix it by taking into account lwtunnel encap headroom. The same commit also introduced difference in how to treat RTAX_MTU in IPv4 and IPv6 where latter explicitly removes lwtunnel encap headroom from route MTU. Make IPv4 version do the same. Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation") Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14Bluetooth: Fix Set Extended (Scan Response) DataLuiz Augusto von Dentz
[ Upstream commit c9ed0a7077306f9d41d74fb006ab5dbada8349c5 ] These command do have variable length and the length can go up to 251, so this changes the struct to not use a fixed size and then when creating the PDU only the actual length of the data send to the controller. Fixes: a0fb3726ba551 ("Bluetooth: Use Set ext adv/scan rsp data if controller supports") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14net: macsec: fix the length used to copy the key for offloadingAntoine Tenart
[ Upstream commit 1f7fe5121127e037b86592ba42ce36515ea0e3f7 ] The key length used when offloading macsec to Ethernet or PHY drivers was set to MACSEC_KEYID_LEN (16), which is an issue as: - This was never meant to be the key length. - The key length can be > 16. Fix this by using MACSEC_MAX_KEY_LEN to store the key (the max length accepted in uAPI) and secy->key_len to copy it. Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure") Reported-by: Lior Nahmanson <liorna@nvidia.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14net: sched: add barrier to ensure correct ordering for lockless qdiscYunsheng Lin
[ Upstream commit 89837eb4b2463c556a123437f242d6c2bc62ce81 ] The spin_trylock() was assumed to contain the implicit barrier needed to ensure the correct ordering between STATE_MISSED setting/clearing and STATE_MISSED checking in commit a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc"). But it turns out that spin_trylock() only has load-acquire semantic, for strongly-ordered system(like x86), the compiler barrier implicitly contained in spin_trylock() seems enough to ensure the correct ordering. But for weakly-orderly system (like arm64), the store-release semantic is needed to ensure the correct ordering as clear_bit() and test_bit() is store operation, see queued_spin_lock(). So add the explicit barrier to ensure the correct ordering for the above case. Fixes: a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14xsk: Fix missing validation for skb and unaligned modeMagnus Karlsson
[ Upstream commit 2f99619820c2269534eb2c0cde44870313c6d353 ] Fix a missing validation of a Tx descriptor when executing in skb mode and the umem is in unaligned mode. A descriptor could point to a buffer straddling the end of the umem, thus effectively tricking the kernel to read outside the allowed umem region. This could lead to a kernel crash if that part of memory is not mapped. In zero-copy mode, the descriptor validation code rejects such descriptors by checking a bit in the DMA address that tells us if the next page is physically contiguous or not. For the last page in the umem, this bit is not set, therefore any descriptor pointing to a packet straddling this last page boundary will be rejected. However, the skb path does not use this bit since it copies out data and can do so to two different pages. (It also does not have the array of DMA address, so it cannot even store this bit.) The code just returned that the packet is always physically contiguous. But this is unfortunately also returned for the last page in the umem, which means that packets that cross the end of the umem are being allowed, which they should not be. Fix this by introducing a check for this in the SKB path only, not penalizing the zero-copy path. Fixes: 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20210617092255.3487-1-magnus.karlsson@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14clk: imx8mq: remove SYS PLL 1/2 clock gatesLucas Stach
[ Upstream commit c586f53ae159c6c1390f093a1ec94baef2df9f3a ] Remove the PLL clock gates as the allowing to gate the sys1_pll_266m breaks the uSDHC module which is sporadically unable to enumerate devices after this change. Also it makes AMP clock management harder with no obvious benefit to Linux, so just revert the change. Link: https://lore.kernel.org/r/20210528180135.1640876-1-l.stach@pengutronix.de Fixes: b04383b6a558 ("clk: imx8mq: Define gates for pll1/2 fixed dividers") Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Abel Vesa <abel.vesa@nxp.com> Signed-off-by: Abel Vesa <abel.vesa@nxp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14net/sched: act_vlan: Fix modify to allow 0Boris Sukholitko
[ Upstream commit 9c5eee0afca09cbde6bd00f77876754aaa552970 ] Currently vlan modification action checks existence of vlan priority by comparing it to 0. Therefore it is impossible to modify existing vlan tag to have priority 0. For example, the following tc command will change the vlan id but will not affect vlan priority: tc filter add dev eth1 ingress matchall action vlan modify id 300 \ priority 0 pipe mirred egress redirect dev eth2 The incoming packet on eth1: ethertype 802.1Q (0x8100), vlan 200, p 4, ethertype IPv4 will be changed to: ethertype 802.1Q (0x8100), vlan 300, p 4, ethertype IPv4 although the user has intended to have p == 0. The fix is to add tcfv_push_prio_exists flag to struct tcf_vlan_params and rely on it when deciding to set the priority. Fixes: 45a497f2d149a4a8061c (net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action) Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14xfrm: xfrm_state_mtu should return at least 1280 for ipv6Sabrina Dubroca
[ Upstream commit b515d2637276a3810d6595e10ab02c13bfd0b63a ] Jianwen reported that IPv6 Interoperability tests are failing in an IPsec case where one of the links between the IPsec peers has an MTU of 1280. The peer generates a packet larger than this MTU, the router replies with a "Packet too big" message indicating an MTU of 1280. When the peer tries to send another large packet, xfrm_state_mtu returns 1280 - ipsec_overhead, which causes ip6_setup_cork to fail with EINVAL. We can fix this by forcing xfrm_state_mtu to return IPV6_MIN_MTU when IPv6 is used. After going through IPsec, the packet will then be fragmented to obey the actual network's PMTU, just before leaving the host. Currently, TFC padding is capped to PMTU - overhead to avoid fragementation: after padding and encapsulation, we still fit within the PMTU. That behavior is preserved in this patch. Fixes: 91657eafb64b ("xfrm: take net hdr len into account for esp payload size calculation") Reported-by: Jianwen Ji <jiji@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14swap: fix do_swap_page() race with swapoffMiaohe Lin
[ Upstream commit 2799e77529c2a25492a4395db93996e3dacd762d ] When I was investigating the swap code, I found the below possible race window: CPU 1 CPU 2 ----- ----- do_swap_page if (data_race(si->flags & SWP_SYNCHRONOUS_IO) swap_readpage if (data_race(sis->flags & SWP_FS_OPS)) { swapoff .. p->swap_file = NULL; .. struct file *swap_file = sis->swap_file; struct address_space *mapping = swap_file->f_mapping;[oops!] Note that for the pages that are swapped in through swap cache, this isn't an issue. Because the page is locked, and the swap entry will be marked with SWAP_HAS_CACHE, so swapoff() can not proceed until the page has been unlocked. Fix this race by using get/put_swap_device() to guard against concurrent swapoff. Link: https://lkml.kernel.org/r/20210426123316.806267-3-linmiaohe@huawei.com Fixes: 0bcac06f27d7 ("mm,swap: skip swapcache for swapin of synchronous device") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Alex Shi <alexs@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14clocksource/drivers/timer-ti-dm: Save and restore timer TIOCP_CFGTony Lindgren
[ Upstream commit 9517c577f9f722270584cfb1a7b4e1354e408658 ] As we are using cpu_pm to save and restore context, we must also save and restore the timer sysconfig register TIOCP_CFG. This is needed because we are not calling PM runtime functions at all with cpu_pm. Fixes: b34677b0999a ("clocksource/drivers/timer-ti-dm: Implement cpu_pm notifier for context save and restore") Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: Adam Ford <aford173@gmail.com> Cc: Andreas Kemnade <andreas@kemnade.info> Cc: Lokesh Vutla <lokeshvutla@ti.com> Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210415085506.56828-1-tony@atomide.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14media: v4l2-async: Clean v4l2_async_notifier_add_fwnode_remote_subdevEzequiel Garcia
[ Upstream commit c1cf3d896d124e3e00794f9bfbde49f0fc279e3f ] Change v4l2_async_notifier_add_fwnode_remote_subdev semantics so it allocates the struct v4l2_async_subdev pointer. This makes the API consistent: the v4l2-async subdevice addition functions have now a unified usage model. This model is simpler, as it makes v4l2-async responsible for the allocation and release of the subdevice descriptor, and no longer something the driver has to worry about. On the user side, the change makes the API simpler for the drivers to use and less error-prone. Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Helen Koike <helen.koike@collabora.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14clocksource: Check per-CPU clock synchronization when marked unstablePaul E. McKenney
[ Upstream commit 7560c02bdffb7c52d1457fa551b9e745d4b9e754 ] Some sorts of per-CPU clock sources have a history of going out of synchronization with each other. However, this problem has purportedy been solved in the past ten years. Except that it is all too possible that the problem has instead simply been made less likely, which might mean that some of the occasional "Marking clocksource 'tsc' as unstable" messages might be due to desynchronization. How would anyone know? Therefore apply CPU-to-CPU synchronization checking to newly unstable clocksource that are marked with the new CLOCK_SOURCE_VERIFY_PERCPU flag. Lists of desynchronized CPUs are printed, with the caveat that if it is the reporting CPU that is itself desynchronized, it will appear that all the other clocks are wrong. Just like in real life. Reported-by: Chris Mason <clm@fb.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Feng Tang <feng.tang@intel.com> Link: https://lore.kernel.org/r/20210527190124.440372-2-paulmck@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14random32: Fix implicit truncation warning in prandom_seed_state()Richard Fitzgerald
[ Upstream commit d327ea15a305024ef0085252fa3657bbb1ce25f5 ] sparse generates the following warning: include/linux/prandom.h:114:45: sparse: sparse: cast truncates bits from constant value This is because the 64-bit seed value is manipulated and then placed in a u32, causing an implicit cast and truncation. A forced cast to u32 doesn't prevent this warning, which is reasonable because a typecast doesn't prove that truncation was expected. Logical-AND the value with 0xffffffff to make explicit that truncation to 32-bit is intended. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210525122012.6336-3-rf@opensource.cirrus.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14media: Fix Media Controller API config checksShuah Khan
[ Upstream commit 50e7a31d30e8221632675abed3be306382324ca2 ] Smatch static checker warns that "mdev" can be null: sound/usb/media.c:287 snd_media_device_create() warn: 'mdev' can also be NULL If CONFIG_MEDIA_CONTROLLER is disabled, this file should not be included in the build. The below conditions in the sound/usb/Makefile are in place to ensure that media.c isn't included in the build. sound/usb/Makefile: snd-usb-audio-$(CONFIG_SND_USB_AUDIO_USE_MEDIA_CONTROLLER) += media.o select SND_USB_AUDIO_USE_MEDIA_CONTROLLER if MEDIA_CONTROLLER && (MEDIA_SUPPORT=y || MEDIA_SUPPORT=SND_USB_AUDIO) The following config check in include/media/media-dev-allocator.h is in place to enable the API only when CONFIG_MEDIA_CONTROLLER and CONFIG_USB are enabled. #if defined(CONFIG_MEDIA_CONTROLLER) && defined(CONFIG_USB) This check doesn't work as intended when CONFIG_USB=m. When CONFIG_USB=m, CONFIG_USB_MODULE is defined and CONFIG_USB is not. The above config check doesn't catch that CONFIG_USB is defined as a module and disables the API. This results in sound/usb enabling Media Controller specific ALSA driver code, while Media disables the Media Controller API. Fix the problem requires two changes: 1. Change the check to use IS_ENABLED to detect when CONFIG_USB is enabled as a module or static. Since CONFIG_MEDIA_CONTROLLER is a bool, leave the check unchanged to be consistent with drivers/media/Makefile. 2. Change the drivers/media/mc/Makefile to include mc-dev-allocator.o in mc-objs when CONFIG_USB is enabled. Link: https://lore.kernel.org/alsa-devel/YLeAvT+R22FQ%2FEyw@mwanda/ Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14crypto: shash - avoid comparing pointers to exported functions under CFIArd Biesheuvel
[ Upstream commit 22ca9f4aaf431a9413dcc115dd590123307f274f ] crypto_shash_alg_has_setkey() is implemented by testing whether the .setkey() member of a struct shash_alg points to the default version, called shash_no_setkey(). As crypto_shash_alg_has_setkey() is a static inline, this requires shash_no_setkey() to be exported to modules. Unfortunately, when building with CFI, function pointers are routed via CFI stubs which are private to each module (or to the kernel proper) and so this function pointer comparison may fail spuriously. Let's fix this by turning crypto_shash_alg_has_setkey() into an out of line function. Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14media: hevc: Fix dependent slice segment flagsJernej Skrabec
[ Upstream commit 67a7e53d5b21f3a84efc03a4e62db7caf97841ef ] Dependent slice segment flag for PPS control is misnamed. It should have "enabled" at the end. It only tells if this flag is present in slice header or not and not the actual value. Fix this by renaming the PPS flag and introduce another flag for slice control which tells actual value. Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14sched/core: Initialize the idle task with preemption disabledValentin Schneider
[ Upstream commit f1a0a376ca0c4ef1fc3d24e3e502acbb5b795674 ] As pointed out by commit de9b8f5dcbd9 ("sched: Fix crash trying to dequeue/enqueue the idle thread") init_idle() can and will be invoked more than once on the same idle task. At boot time, it is invoked for the boot CPU thread by sched_init(). Then smp_init() creates the threads for all the secondary CPUs and invokes init_idle() on them. As the hotplug machinery brings the secondaries to life, it will issue calls to idle_thread_get(), which itself invokes init_idle() yet again. In this case it's invoked twice more per secondary: at _cpu_up(), and at bringup_cpu(). Given smp_init() already initializes the idle tasks for all *possible* CPUs, no further initialization should be required. Now, removing init_idle() from idle_thread_get() exposes some interesting expectations with regards to the idle task's preempt_count: the secondary startup always issues a preempt_disable(), requiring some reset of the preempt count to 0 between hot-unplug and hotplug, which is currently served by idle_thread_get() -> idle_init(). Given the idle task is supposed to have preemption disabled once and never see it re-enabled, it seems that what we actually want is to initialize its preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove init_idle() from idle_thread_get(). Secondary startups were patched via coccinelle: @begone@ @@ -preempt_disable(); ... cpu_startup_entry(CPUHP_AP_ONLINE_IDLE); Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14Add a reference to ucounts for each credAlexey Gladkov
[ Upstream commit 905ae01c4ae2ae3df05bb141801b1db4b7d83c61 ] For RLIMIT_NPROC and some other rlimits the user_struct that holds the global limit is kept alive for the lifetime of a process by keeping it in struct cred. Adding a pointer to ucounts in the struct cred will allow to track RLIMIT_NPROC not only for user in the system, but for user in the user_namespace. Updating ucounts may require memory allocation which may fail. So, we cannot change cred.ucounts in the commit_creds() because this function cannot fail and it should always return 0. For this reason, we modify cred.ucounts before calling the commit_creds(). Changelog v6: * Fix null-ptr-deref in is_ucounts_overlimit() detected by trinity. This error was caused by the fact that cred_alloc_blank() left the ucounts pointer empty. Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Alexey Gladkov <legion@kernel.org> Link: https://lkml.kernel.org/r/b37aaef28d8b9b0d757e07ba6dd27281bbe39259.1619094428.git.legion@kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracingSteven Rostedt (VMware)
commit 9913d5745bd720c4266805c8d29952a3702e4eca upstream. All internal use cases for tracepoint_probe_register() is set to not ever be called with the same function and data. If it is, it is considered a bug, as that means the accounting of handling tracepoints is corrupted. If the function and data for a tracepoint is already registered when tracepoint_probe_register() is called, it will call WARN_ON_ONCE() and return with EEXISTS. The BPF system call can end up calling tracepoint_probe_register() with the same data, which now means that this can trigger the warning because of a user space process. As WARN_ON_ONCE() should not be called because user space called a system call with bad data, there needs to be a way to register a tracepoint without triggering a warning. Enter tracepoint_probe_register_may_exist(), which can be called, but will not cause a WARN_ON() if the probe already exists. It will still error out with EEXIST, which will then be sent to the user space that performed the BPF system call. This keeps the previous testing for issues with other users of the tracepoint code, while letting BPF call it with duplicated data and not warn about it. Link: https://lore.kernel.org/lkml/20210626135845.4080-1-penguin-kernel@I-love.SAKURA.ne.jp/ Link: https://syzkaller.appspot.com/bug?id=41f4318cf01762389f4d1c1c459da4f542fe5153 Cc: stable@vger.kernel.org Fixes: c4f6699dfcb85 ("bpf: introduce BPF_RAW_TRACEPOINT") Reported-by: syzbot <syzbot+721aa903751db87aa244@syzkaller.appspotmail.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot+721aa903751db87aa244@syzkaller.appspotmail.com Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30certs: Add EFI_CERT_X509_GUID support for dbx entriesEric Snowberg
[ Upstream commit 56c5812623f95313f6a46fbf0beee7fa17c68bbf ] This fixes CVE-2020-26541. The Secure Boot Forbidden Signature Database, dbx, contains a list of now revoked signatures and keys previously approved to boot with UEFI Secure Boot enabled. The dbx is capable of containing any number of EFI_CERT_X509_SHA256_GUID, EFI_CERT_SHA256_GUID, and EFI_CERT_X509_GUID entries. Currently when EFI_CERT_X509_GUID are contained in the dbx, the entries are skipped. Add support for EFI_CERT_X509_GUID dbx entries. When a EFI_CERT_X509_GUID is found, it is added as an asymmetrical key to the .blacklist keyring. Anytime the .platform keyring is used, the keys in the .blacklist keyring are referenced, if a matching key is found, the key will be rejected. [DH: Made the following changes: - Added to have a config option to enable the facility. This allows a Kconfig solution to make sure that pkcs7_validate_trust() is enabled.[1][2] - Moved the functions out from the middle of the blacklist functions. - Added kerneldoc comments.] Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> cc: Randy Dunlap <rdunlap@infradead.org> cc: Mickaël Salaün <mic@digikod.net> cc: Arnd Bergmann <arnd@kernel.org> cc: keyrings@vger.kernel.org Link: https://lore.kernel.org/r/20200901165143.10295-1-eric.snowberg@oracle.com/ # rfc Link: https://lore.kernel.org/r/20200909172736.73003-1-eric.snowberg@oracle.com/ # v2 Link: https://lore.kernel.org/r/20200911182230.62266-1-eric.snowberg@oracle.com/ # v3 Link: https://lore.kernel.org/r/20200916004927.64276-1-eric.snowberg@oracle.com/ # v4 Link: https://lore.kernel.org/r/20210122181054.32635-2-eric.snowberg@oracle.com/ # v5 Link: https://lore.kernel.org/r/161428672051.677100.11064981943343605138.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/161433310942.902181.4901864302675874242.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161529605075.163428.14625520893961300757.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/bc2c24e3-ed68-2521-0bf4-a1f6be4a895d@infradead.org/ [1] Link: https://lore.kernel.org/r/20210225125638.1841436-1-arnd@kernel.org/ [2] Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-30mm, futex: fix shared futex pgoff on shmem huge pageHugh Dickins
commit fe19bd3dae3d15d2fbfdb3de8839a6ea0fe94264 upstream. If more than one futex is placed on a shmem huge page, it can happen that waking the second wakes the first instead, and leaves the second waiting: the key's shared.pgoff is wrong. When 3.11 commit 13d60f4b6ab5 ("futex: Take hugepages into account when generating futex_key"), the only shared huge pages came from hugetlbfs, and the code added to deal with its exceptional page->index was put into hugetlb source. Then that was missed when 4.8 added shmem huge pages. page_to_pgoff() is what others use for this nowadays: except that, as currently written, it gives the right answer on hugetlbfs head, but nonsense on hugetlbfs tails. Fix that by calling hugetlbfs-specific hugetlb_basepage_index() on PageHuge tails as well as on head. Yes, it's unconventional to declare hugetlb_basepage_index() there in pagemap.h, rather than in hugetlb.h; but I do not expect anything but page_to_pgoff() ever to need it. [akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope] Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com Fixes: 800d8c63b2e9 ("shmem: add huge pages support") Reported-by: Neel Natu <neelnatu@google.com> Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Zhang Yi <wetpzy@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()Hugh Dickins
[ Upstream commit 22061a1ffabdb9c3385de159c5db7aac3a4df1cc ] There is a race between THP unmapping and truncation, when truncate sees pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared it, but before its page_remove_rmap() gets to decrement compound_mapcount: generating false "BUG: Bad page cache" reports that the page is still mapped when deleted. This commit fixes that, but not in the way I hoped. The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK) instead of unmap_mapping_range() in truncate_cleanup_page(): it has often been an annoyance that we usually call unmap_mapping_range() with no pages locked, but there apply it to a single locked page. try_to_unmap() looks more suitable for a single locked page. However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page): it is used to insert THP migration entries, but not used to unmap THPs. Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB needs are different, I'm too ignorant of the DAX cases, and couldn't decide how far to go for anon+swap. Set that aside. The second attempt took a different tack: make no change in truncate.c, but modify zap_huge_pmd() to insert an invalidated huge pmd instead of clearing it initially, then pmd_clear() between page_remove_rmap() and unlocking at the end. Nice. But powerpc blows that approach out of the water, with its serialize_against_pte_lookup(), and interesting pgtable usage. It would need serious help to get working on powerpc (with a minor optimization issue on s390 too). Set that aside. Just add an "if (page_mapped(page)) synchronize_rcu();" or other such delay, after unmapping in truncate_cleanup_page()? Perhaps, but though that's likely to reduce or eliminate the number of incidents, it would give less assurance of whether we had identified the problem correctly. This successful iteration introduces "unmap_mapping_page(page)" instead of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route, with an addition to details. Then zap_pmd_range() watches for this case, and does spin_unlock(pmd_lock) if so - just like page_vma_mapped_walk() now does in the PVMW_SYNC case. Not pretty, but safe. Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to assert its interface; but currently that's only used to make sure that page->mapping is stable, and zap_pmd_range() doesn't care if the page is locked or not. Along these lines, in invalidate_inode_pages2_range() move the initial unmap_mapping_range() out from under page lock, before then calling unmap_mapping_page() under page lock if still mapped. Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com Fixes: fc127da085c2 ("truncate: handle file thp") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: fixed up call to truncate_cleanup_page() in truncate_inode_pages_range(). Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30mm/thp: try_to_unmap() use TTU_SYNC for safe splittingHugh Dickins
commit 732ed55823fc3ad998d43b86bf771887bcc5ec67 upstream. Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0. And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently. I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount. Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page(). When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated. Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30mm/thp: make is_huge_zero_pmd() safe and quickerHugh Dickins
commit 3b77e8c8cde581dadab9a0f1543a347e24315f11 upstream. Most callers of is_huge_zero_pmd() supply a pmd already verified present; but a few (notably zap_huge_pmd()) do not - it might be a pmd migration entry, in which the pfn is encoded differently from a present pmd: which might pass the is_huge_zero_pmd() test (though not on x86, since L1TF forced us to protect against that); or perhaps even crash in pmd_page() applied to a swap-like entry. Make it safe by adding pmd_present() check into is_huge_zero_pmd() itself; and make it quicker by saving huge_zero_pfn, so that is_huge_zero_pmd() will not need to do that pmd_page() lookup each time. __split_huge_pmd_locked() checked pmd_trans_huge() before: that worked, but is unnecessary now that is_huge_zero_pmd() checks present. Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp") Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jan Kara <jack@suse.cz> Cc: Jue Wang <juew@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Wang Yugui <wangyugui@e16-tech.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30mm: add VM_WARN_ON_ONCE_PAGE() macroAlex Shi
[ Upstream commit a4055888629bc0467d12d912cd7c90acdf3d9b12 part ] Add VM_WARN_ON_ONCE_PAGE() macro. Link: https://lkml.kernel.org/r/1604283436-18880-3-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Note on stable backport: original commit was titled mm/memcg: warning on !memcg after readahead page charged which included uses of this macro in mm/memcontrol.c: here omitted. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-30inet: annotate date races around sk->sk_txhashEric Dumazet
[ Upstream commit b71eaed8c04f72a919a9c44e83e4ee254e69e7f3 ] UDP sendmsg() path can be lockless, it is possible for another thread to re-connect an change sk->sk_txhash under us. There is no serious impact, but we can use READ_ONCE()/WRITE_ONCE() pair to document the race. BUG: KCSAN: data-race in __ip4_datagram_connect / skb_set_owner_w write to 0xffff88813397920c of 4 bytes by task 30997 on cpu 1: sk_set_txhash include/net/sock.h:1937 [inline] __ip4_datagram_connect+0x69e/0x710 net/ipv4/datagram.c:75 __ip6_datagram_connect+0x551/0x840 net/ipv6/datagram.c:189 ip6_datagram_connect+0x2a/0x40 net/ipv6/datagram.c:272 inet_dgram_connect+0xfd/0x180 net/ipv4/af_inet.c:580 __sys_connect_file net/socket.c:1837 [inline] __sys_connect+0x245/0x280 net/socket.c:1854 __do_sys_connect net/socket.c:1864 [inline] __se_sys_connect net/socket.c:1861 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:1861 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff88813397920c of 4 bytes by task 31039 on cpu 0: skb_set_hash_from_sk include/net/sock.h:2211 [inline] skb_set_owner_w+0x118/0x220 net/core/sock.c:2101 sock_alloc_send_pskb+0x452/0x4e0 net/core/sock.c:2359 sock_alloc_send_skb+0x2d/0x40 net/core/sock.c:2373 __ip6_append_data+0x1743/0x21a0 net/ipv6/ip6_output.c:1621 ip6_make_skb+0x258/0x420 net/ipv6/ip6_output.c:1983 udpv6_sendmsg+0x160a/0x16b0 net/ipv6/udp.c:1527 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:642 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350 ___sys_sendmsg net/socket.c:2404 [inline] __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490 __do_sys_sendmmsg net/socket.c:2519 [inline] __se_sys_sendmmsg net/socket.c:2516 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xbca3c43d -> 0xfdb309e0 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 31039 Comm: syz-executor.2 Not tainted 5.13.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-30net: annotate data race in sock_error()Eric Dumazet
[ Upstream commit f13ef10059ccf5f4ed201cd050176df62ec25bb8 ] sock_error() is known to be racy. The code avoids an atomic operation is sk_err is zero, and this field could be changed under us, this is fine. Sysbot reported: BUG: KCSAN: data-race in sock_alloc_send_pskb / unix_release_sock write to 0xffff888131855630 of 4 bytes by task 9365 on cpu 1: unix_release_sock+0x2e9/0x6e0 net/unix/af_unix.c:550 unix_release+0x2f/0x50 net/unix/af_unix.c:859 __sock_release net/socket.c:599 [inline] sock_close+0x6c/0x150 net/socket.c:1258 __fput+0x25b/0x4e0 fs/file_table.c:280 ____fput+0x11/0x20 fs/file_table.c:313 task_work_run+0xae/0x130 kernel/task_work.c:164 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x156/0x190 kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301 do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888131855630 of 4 bytes by task 9385 on cpu 0: sock_error include/net/sock.h:2269 [inline] sock_alloc_send_pskb+0xe4/0x4e0 net/core/sock.c:2336 unix_dgram_sendmsg+0x478/0x1610 net/unix/af_unix.c:1671 unix_seqpacket_sendmsg+0xc2/0x100 net/unix/af_unix.c:2055 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350 __sys_sendmsg_sock+0x25/0x30 net/socket.c:2416 io_sendmsg fs/io_uring.c:4367 [inline] io_issue_sqe+0x231a/0x6750 fs/io_uring.c:6135 __io_queue_sqe+0xe9/0x360 fs/io_uring.c:6414 __io_req_task_submit fs/io_uring.c:2039 [inline] io_async_task_func+0x312/0x590 fs/io_uring.c:5074 __tctx_task_work fs/io_uring.c:1910 [inline] tctx_task_work+0x1d4/0x3d0 fs/io_uring.c:1924 task_work_run+0xae/0x130 kernel/task_work.c:164 tracehook_notify_signal include/linux/tracehook.h:212 [inline] handle_signal_work kernel/entry/common.c:145 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0xf8/0x190 kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301 do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x00000000 -> 0x00000068 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 9385 Comm: syz-executor.3 Not tainted 5.13.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-30locking/lockdep: Improve noinstr vs errorsPeter Zijlstra
[ Upstream commit 49faa77759b211fff344898edc23bb780707fff5 ] Better handle the failure paths. vmlinux.o: warning: objtool: debug_locks_off()+0x23: call to console_verbose() leaves .noinstr.text section vmlinux.o: warning: objtool: debug_locks_off()+0x19: call to __kasan_check_write() leaves .noinstr.text section debug_locks_off+0x19/0x40: instrument_atomic_write at include/linux/instrumented.h:86 (inlined by) __debug_locks_off at include/linux/debug_locks.h:17 (inlined by) debug_locks_off at lib/debug_locks.c:41 Fixes: 6eebad1ad303 ("lockdep: __always_inline more for noinstr") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210621120120.784404944@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-06-23mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when comparePeter Xu
commit 099dd6878b9b12d6bbfa6bf29ce0c8ddd38f6901 upstream. I found it by pure code review, that pte_same_as_swp() of unuse_vma() didn't take uffd-wp bit into account when comparing ptes. pte_same_as_swp() returning false negative could cause failure to swapoff swap ptes that was wr-protected by userfaultfd. Link: https://lkml.kernel.org/r/20210603180546.9083-1-peterx@redhat.com Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration") Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [5.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>