summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2020-08-21iommu/vt-d: Enforce PASID devTLB field maskLiu Yi L
[ Upstream commit 5f77d6ca5ca74e4b4a5e2e010f7ff50c45dea326 ] Set proper masks to avoid invalid input spillover to reserved bits. Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20200724014925.15523-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21genirq/affinity: Make affinity setting if activated opt-inThomas Gleixner
commit f0c7baca180046824e07fc5f1326e83a8fd150c7 upstream. John reported that on a RK3288 system the perf per CPU interrupts are all affine to CPU0 and provided the analysis: "It looks like what happens is that because the interrupts are not per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() while the interrupt is deactivated and then request_irq() with IRQF_PERCPU | IRQF_NOBALANCING. Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls irq_setup_affinity() which returns early because IRQF_PERCPU and IRQF_NOBALANCING are set, leaving the interrupt on its original CPU." This was broken by the recent commit which blocked interrupt affinity setting in hardware before activation of the interrupt. While this works in general, it does not work for this particular case. As contrary to the initial analysis not all interrupt chip drivers implement an activate callback, the safe cure is to make the deferred interrupt affinity setting at activation time opt-in. Implement the necessary core logic and make the two irqchip implementations for which this is required opt-in. In hindsight this would have been the right thing to do, but ... Fixes: baedb87d1b53 ("genirq/affinity: Handle affinity setting on inactive interrupts correctly") Reported-by: John Keeping <john@metanate.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Marc Zyngier <maz@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19bitfield.h: don't compile-time validate _val in FIELD_FITJakub Kicinski
commit 444da3f52407d74c9aa12187ac6b01f76ee47d62 upstream. When ur_load_imm_any() is inlined into jeq_imm(), it's possible for the compiler to deduce a case where _val can only have the value of -1 at compile time. Specifically, /* struct bpf_insn: _s32 imm */ u64 imm = insn->imm; /* sign extend */ if (imm >> 32) { /* non-zero only if insn->imm is negative */ /* inlined from ur_load_imm_any */ u32 __imm = imm >> 32; /* therefore, always 0xffffffff */ if (__builtin_constant_p(__imm) && __imm > 255) compiletime_assert_XXX() This can result in tripping a BUILD_BUG_ON() in __BF_FIELD_CHECK() that checks that a given value is representable in one byte (interpreted as unsigned). FIELD_FIT() should return true or false at runtime for whether a value can fit for not. Don't break the build over a value that's too large for the mask. We'd prefer to keep the inlining and compiler optimizations though we know this case will always return false. Cc: stable@vger.kernel.org Fixes: 1697599ee301a ("bitfield.h: add FIELD_FIT() helper") Link: https://lore.kernel.org/kernel-hardening/CAK7LNASvb0UDJ0U5wkYYRzTAdnEs64HjXpEUL7d=V0CXiAXcNw@mail.gmail.com/ Reported-by: Masahiro Yamada <masahiroy@kernel.org> Debugged-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19tpm: Unify the mismatching TPM space buffer sizesJarkko Sakkinen
commit 6c4e79d99e6f42b79040f1a33cd4018f5425030b upstream. The size of the buffers for storing context's and sessions can vary from arch to arch as PAGE_SIZE can be anything between 4 kB and 256 kB (the maximum for PPC64). Define a fixed buffer size set to 16 kB. This should be enough for most use with three handles (that is how many we allow at the moment). Parametrize the buffer size while doing this, so that it is easier to revisit this later on if required. Cc: stable@vger.kernel.org Reported-by: Stefan Berger <stefanb@linux.ibm.com> Fixes: 745b361e989a ("tpm: infrastructure for TPM spaces") Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-19tpm: Require that all digests are present in TCG_PCR_EVENT2 structuresTyler Hicks
[ Upstream commit 7f3d176f5f7e3f0477bf82df0f600fcddcdcc4e4 ] Require that the TCG_PCR_EVENT2.digests.count value strictly matches the value of TCG_EfiSpecIdEvent.numberOfAlgorithms in the event field of the TCG_PCClientPCREvent event log header. Also require that TCG_EfiSpecIdEvent.numberOfAlgorithms is non-zero. The TCG PC Client Platform Firmware Profile Specification section 9.1 (Family "2.0", Level 00 Revision 1.04) states: For each Hash algorithm enumerated in the TCG_PCClientPCREvent entry, there SHALL be a corresponding digest in all TCG_PCR_EVENT2 structures. Note: This includes EV_NO_ACTION events which do not extend the PCR. Section 9.4.5.1 provides this description of TCG_EfiSpecIdEvent.numberOfAlgorithms: The number of Hash algorithms in the digestSizes field. This field MUST be set to a value of 0x01 or greater. Enforce these restrictions, as required by the above specification, in order to better identify and ignore invalid sequences of bytes at the end of an otherwise valid TPM2 event log. Firmware doesn't always have the means necessary to inform the kernel of the actual event log size so the kernel's event log parsing code should be stringent when parsing the event log for resiliency against firmware bugs. This is true, for example, when firmware passes the event log to the kernel via a reserved memory region described in device tree. POWER and some ARM systems use the "linux,sml-base" and "linux,sml-size" device tree properties to describe the memory region used to pass the event log from firmware to the kernel. Unfortunately, the "linux,sml-size" property describes the size of the entire reserved memory region rather than the size of the event long within the memory region and the event log format does not include information describing the size of the event log. tpm_read_log_of(), in drivers/char/tpm/eventlog/of.c, is where the "linux,sml-size" property is used. At the end of that function, log->bios_event_log_end is pointing at the end of the reserved memory region. That's typically 0x10000 bytes offset from "linux,sml-base", depending on what's defined in the device tree source. The firmware event log only fills a portion of those 0x10000 bytes and the rest of the memory region should be zeroed out by firmware. Even in the case of a properly zeroed bytes in the remainder of the memory region, the only thing allowing the kernel's event log parser to detect the end of the event log is the following conditional in __calc_tpm2_event_size(): if (event_type == 0 && event_field->event_size == 0) size = 0; If that wasn't there, __calc_tpm2_event_size() would think that a 16 byte sequence of zeroes, following an otherwise valid event log, was a valid event. However, problems can occur if a single bit is set in the offset corresponding to either the TCG_PCR_EVENT2.eventType or TCG_PCR_EVENT2.eventSize fields, after the last valid event log entry. This could confuse the parser into thinking that an additional entry is present in the event log and exposing this invalid entry to userspace in the /sys/kernel/security/tpm0/binary_bios_measurements file. Such problems have been seen if firmware does not fully zero the memory region upon a warm reboot. This patch significantly raises the bar on how difficult it is for stale/invalid memory to confuse the kernel's event log parser but there's still, ultimately, a reliance on firmware to properly initialize the remainder of the memory region reserved for the event log as the parser cannot be expected to detect a stale but otherwise properly formatted firmware event log entry. Fixes: fd5c78694f3f ("tpm: fix handling of the TPM 2.0 event logs") Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-19tracepoint: Mark __tracepoint_string's __usedNick Desaulniers
commit f3751ad0116fb6881f2c3c957d66a9327f69cefb upstream. __tracepoint_string's have their string data stored in .rodata, and an address to that data stored in the "__tracepoint_str" section. Functions that refer to those strings refer to the symbol of the address. Compiler optimization can replace those address references with references directly to the string data. If the address doesn't appear to have other uses, then it appears dead to the compiler and is removed. This can break the /tracing/printk_formats sysfs node which iterates the addresses stored in the "__tracepoint_str" section. Like other strings stored in custom sections in this header, mark these __used to inform the compiler that there are other non-obvious users of the address, so they should still be emitted. Link: https://lkml.kernel.org/r/20200730224555.2142154-2-ndesaulniers@google.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: stable@vger.kernel.org Fixes: 102c9323c35a8 ("tracing: Add __tracepoint_string() to export string pointers") Reported-by: Tim Murray <timmurray@google.com> Reported-by: Simon MacMullen <simonmacm@google.com> Suggested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-11nfsd: Fix NFSv4 READ on RDMA when using readvChuck Lever
commit 412055398b9e67e07347a936fc4a6adddabe9cf4 upstream. svcrdma expects that the payload falls precisely into the xdr_buf page vector. This does not seem to be the case for nfsd4_encode_readv(). This code is called only when fops->splice_read is missing or when RQ_SPLICE_OK is clear, so it's not a noticeable problem in many common cases. Add new transport method: ->xpo_read_payload so that when a READ payload does not fit exactly in rq_res's page vector, the XDR encoder can inform the RPC transport exactly where that payload is, without the payload's XDR pad. That way, when a Write chunk is present, the transport knows what byte range in the Reply message is supposed to be matched with the chunk. Note that the Linux NFS server implementation of NFS/RDMA can currently handle only one Write chunk per RPC-over-RDMA message. This simplifies the implementation of this fix. Fixes: b04209806384 ("nfsd4: allow exotic read compounds") Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Timo Rothenpieler <timo@rothenpieler.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-11xattr: break delegations in {set,remove}xattrFrank van der Linden
commit 08b5d5014a27e717826999ad20e394a8811aae92 upstream. set/removexattr on an exported filesystem should break NFS delegations. This is true in general, but also for the upcoming support for RFC 8726 (NFSv4 extended attribute support). Make sure that they do. Additionally, they need to grow a _locked variant, since callers might call this with i_rwsem held (like the NFS server code). Cc: stable@vger.kernel.org # v4.9+ Cc: linux-fsdevel@vger.kernel.org Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-11Drivers: hv: vmbus: Ignore CHANNELMSG_TL_CONNECT_RESULT(23)Dexuan Cui
[ Upstream commit ddc9d357b991838c2d975e8d7e4e9db26f37a7ff ] When a Linux hv_sock app tries to connect to a Service GUID on which no host app is listening, a recent host (RS3+) sends a CHANNELMSG_TL_CONNECT_RESULT (23) message to Linux and this triggers such a warning: unknown msgtype=23 WARNING: CPU: 2 PID: 0 at drivers/hv/vmbus_drv.c:1031 vmbus_on_msg_dpc Actually Linux can safely ignore the message because the Linux app's connect() will time out in 2 seconds: see VSOCK_DEFAULT_CONNECT_TIMEOUT and vsock_stream_connect(). We don't bother to make use of the message because: 1) it's only supported on recent hosts; 2) a non-trivial effort is required to use the message in Linux, but the benefit is small. So, let's not see the warning by silently ignoring the message. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-07bpf: sockmap: Require attach_bpf_fd when detaching a programLorenz Bauer
commit bb0de3131f4c60a9bf976681e0fe4d1e55c7a821 upstream. The sockmap code currently ignores the value of attach_bpf_fd when detaching a program. This is contrary to the usual behaviour of checking that attach_bpf_fd represents the currently attached program. Ensure that attach_bpf_fd is indeed the currently attached program. It turns out that all sockmap selftests already do this, which indicates that this is unlikely to cause breakage. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200629095630.7933-5-lmb@cloudflare.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-07random32: move the pseudo-random 32-bit definitions to prandom.hLinus Torvalds
commit c0842fbc1b18c7a044e6ff3e8fa78bfa822c7d1a upstream. The addition of percpu.h to the list of includes in random.h revealed some circular dependencies on arm64 and possibly other platforms. This include was added solely for the pseudo-random definitions, which have nothing to do with the rest of the definitions in this file but are still there for legacy reasons. This patch moves the pseudo-random parts to linux/prandom.h and the percpu.h include with it, which is now guarded by _LINUX_PRANDOM_H and protected against recursive inclusion. A further cleanup step would be to remove this from <linux/random.h> entirely, and make people who use the prandom infrastructure include just the new header file. That's a bit of a churn patch, but grepping for "prandom_" and "next_pseudo_random32" "struct rnd_state" should catch most users. But it turns out that that nice cleanup step is fairly painful, because a _lot_ of code currently seems to depend on the implicit include of <linux/random.h>, which can currently come in a lot of ways, including such fairly core headfers as <linux/net.h>. So the "nice cleanup" part may or may never happen. Fixes: 1c9df907da83 ("random: fix circular include dependency on arm64 after addition of percpu.h") Tested-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-07random32: remove net_rand_state from the latent entropy gcc pluginLinus Torvalds
commit 83bdc7275e6206f560d247be856bceba3e1ed8f2 upstream. It turns out that the plugin right now ends up being really unhappy about the change from 'static' to 'extern' storage that happened in commit f227e3ec3b5c ("random32: update the net random state on interrupt and activity"). This is probably a trivial fix for the latent_entropy plugin, but for now, just remove net_rand_state from the list of things the plugin worries about. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Emese Revfy <re.emese@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-07random: fix circular include dependency on arm64 after addition of percpu.hWilly Tarreau
commit 1c9df907da83812e4f33b59d3d142c864d9da57f upstream. Daniel Díaz and Kees Cook independently reported that commit f227e3ec3b5c ("random32: update the net random state on interrupt and activity") broke arm64 due to a circular dependency on include files since the addition of percpu.h in random.h. The correct fix would definitely be to move all the prandom32 stuff out of random.h but for backporting, a smaller solution is preferred. This one replaces linux/percpu.h with asm/percpu.h, and this fixes the problem on x86_64, arm64, arm, and mips. Note that moving percpu.h around didn't change anything and that removing it entirely broke differently. When backporting, such options might still be considered if this patch fails to help. [ It turns out that an alternate fix seems to be to just remove the troublesome <asm/pointer_auth.h> remove from the arm64 <asm/smp.h> that causes the circular dependency. But we might as well do the whole belt-and-suspenders thing, and minimize inclusion in <linux/random.h> too. Either will fix the problem, and both are good changes. - Linus ] Reported-by: Daniel Díaz <daniel.diaz@linaro.org> Reported-by: Kees Cook <keescook@chromium.org> Tested-by: Marc Zyngier <maz@kernel.org> Fixes: f227e3ec3b5c Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-07random32: update the net random state on interrupt and activityWilly Tarreau
commit f227e3ec3b5cad859ad15666874405e8c1bbc1d4 upstream. This modifies the first 32 bits out of the 128 bits of a random CPU's net_rand_state on interrupt or CPU activity to complicate remote observations that could lead to guessing the network RNG's internal state. Note that depending on some network devices' interrupt rate moderation or binding, this re-seeding might happen on every packet or even almost never. In addition, with NOHZ some CPUs might not even get timer interrupts, leaving their local state rarely updated, while they are running networked processes making use of the random state. For this reason, we also perform this update in update_process_times() in order to at least update the state when there is user or system activity, since it's the only case we care about. Reported-by: Amit Klein <aksecurity@gmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Dumazet <edumazet@google.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-08-05rhashtable: Fix unprotected RCU dereference in __rht_ptrHerbert Xu
[ Upstream commit 1748f6a2cbc4694523f16da1c892b59861045b9d ] The rcu_dereference call in rht_ptr_rcu is completely bogus because we've already dereferenced the value in __rht_ptr and operated on it. This causes potential double readings which could be fatal. The RCU dereference must occur prior to the comparison in __rht_ptr. This patch changes the order of RCU dereference so that it is done first and the result is then fed to __rht_ptr. The RCU marking changes have been minimised using casts which will be removed in a follow-up patch. Fixes: ba6306e3f648 ("rhashtable: Remove RCU marking from...") Reported-by: "Gong, Sishuai" <sishuai@purdue.edu> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-05net/mlx5e: Modify uplink state on interface up/downRon Diskin
[ Upstream commit 7d0314b11cdd92bca8b89684c06953bf114605fc ] When setting the PF interface up/down, notify the firmware to update uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled. This behavior will prevent sending traffic out on uplink port when PF is down, such as sending traffic from a VF interface which is still up. Currently when calling mlx5e_open/close(), the driver only sends PAOS command to notify the firmware to set the physical port state to up/down, however, it is not sufficient. When VF is in "auto" state, it follows the uplink state, which was not updated on mlx5e_open/close() before this patch. When switchdev mode is enabled and uplink representor is first enabled, set the uplink port state value back to its FW default "AUTO". Fixes: 63bfd399de55 ("net/mlx5e: Send PAOS command on interface up/down") Signed-off-by: Ron Diskin <rondi@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-31tcp: allow at most one TLP probe per flightYuchung Cheng
[ Upstream commit 76be93fc0702322179bb0ea87295d820ee46ad14 ] Previously TLP may send multiple probes of new data in one flight. This happens when the sender is cwnd limited. After the initial TLP containing new data is sent, the sender receives another ACK that acks partial inflight. It may re-arm another TLP timer to send more, if no further ACK returns before the next TLP timeout (PTO) expires. The sender may send in theory a large amount of TLP until send queue is depleted. This only happens if the sender sees such irregular uncommon ACK pattern. But it is generally undesirable behavior during congestion especially. The original TLP design restrict only one TLP probe per inflight as published in "Reducing Web Latency: the Virtue of Gentle Aggression", SIGCOMM 2013. This patch changes TLP to send at most one probe per inflight. Note that if the sender is app-limited, TLP retransmits old data and did not have this issue. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29dm integrity: fix integrity recalculation that is improperly skippedMikulas Patocka
commit 5df96f2b9f58a5d2dc1f30fe7de75e197f2c25f2 upstream. Commit adc0daad366b62ca1bce3e2958a40b0b71a8b8b3 ("dm: report suspended device during destroy") broke integrity recalculation. The problem is dm_suspended() returns true not only during suspend, but also during resume. So this race condition could occur: 1. dm_integrity_resume calls queue_work(ic->recalc_wq, &ic->recalc_work) 2. integrity_recalc (&ic->recalc_work) preempts the current thread 3. integrity_recalc calls if (unlikely(dm_suspended(ic->ti))) goto unlock_ret; 4. integrity_recalc exits and no recalculating is done. To fix this race condition, add a function dm_post_suspending that is only true during the postsuspend phase and use it instead of dm_suspended(). Signed-off-by: Mikulas Patocka <mpatocka redhat com> Fixes: adc0daad366b ("dm: report suspended device during destroy") Cc: stable vger kernel org # v4.18+ Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29io-mapping: indicate mapping failureMichael J. Ruhl
commit e0b3e0b1a04367fc15c07f44e78361545b55357c upstream. The !ATOMIC_IOMAP version of io_maping_init_wc will always return success, even when the ioremap fails. Since the ATOMIC_IOMAP version returns NULL when the init fails, and callers check for a NULL return on error this is unexpected. During a device probe, where the ioremap failed, a crash can look like this: BUG: unable to handle page fault for address: 0000000000210000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page Oops: 0002 [#1] PREEMPT SMP CPU: 0 PID: 177 Comm: RIP: 0010:fill_page_dma [i915] gen8_ppgtt_create [i915] i915_ppgtt_create [i915] intel_gt_init [i915] i915_gem_init [i915] i915_driver_probe [i915] pci_device_probe really_probe driver_probe_device The remap failure occurred much earlier in the probe. If it had been propagated, the driver would have exited with an error. Return NULL on ioremap failure. [akpm@linux-foundation.org: detect ioremap_wc() errors earlier] Fixes: cafaf14a5d8f ("io-mapping: Always create a struct to hold metadata about the io-mapping") Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200721171936.81563-1-michael.j.ruhl@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-29Input: add `SW_MACHINE_COVER`Merlijn Wajer
[ Upstream commit c463bb2a8f8d7d97aa414bf7714fc77e9d3b10df ] This event code represents the state of a removable cover of a device. Value 0 means that the cover is open or removed, value 1 means that the cover is closed. Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Merlijn Wajer <merlijn@wizzup.org> Link: https://lore.kernel.org/r/20200612125402.18393-2-merlijn@wizzup.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-29dmabuf: use spinlock to access dmabuf->nameCharan Teja Kalla
[ Upstream commit 6348dd291e3653534a9e28e6917569bc9967b35b ] There exists a sleep-while-atomic bug while accessing the dmabuf->name under mutex in the dmabuffs_dname(). This is caused from the SELinux permissions checks on a process where it tries to validate the inherited files from fork() by traversing them through iterate_fd() (which traverse files under spin_lock) and call match_file(security/selinux/hooks.c) where the permission checks happen. This audit information is logged using dump_common_audit_data() where it calls d_path() to get the file path name. If the file check happen on the dmabuf's fd, then it ends up in ->dmabuffs_dname() and use mutex to access dmabuf->name. The flow will be like below: flush_unauthorized_files() iterate_fd() spin_lock() --> Start of the atomic section. match_file() file_has_perm() avc_has_perm() avc_audit() slow_avc_audit() common_lsm_audit() dump_common_audit_data() audit_log_d_path() d_path() dmabuffs_dname() mutex_lock()--> Sleep while atomic. Call trace captured (on 4.19 kernels) is below: ___might_sleep+0x204/0x208 __might_sleep+0x50/0x88 __mutex_lock_common+0x5c/0x1068 __mutex_lock_common+0x5c/0x1068 mutex_lock_nested+0x40/0x50 dmabuffs_dname+0xa0/0x170 d_path+0x84/0x290 audit_log_d_path+0x74/0x130 common_lsm_audit+0x334/0x6e8 slow_avc_audit+0xb8/0xf8 avc_has_perm+0x154/0x218 file_has_perm+0x70/0x180 match_file+0x60/0x78 iterate_fd+0x128/0x168 selinux_bprm_committing_creds+0x178/0x248 security_bprm_committing_creds+0x30/0x48 install_exec_creds+0x1c/0x68 load_elf_binary+0x3a4/0x14e0 search_binary_handler+0xb0/0x1e0 So, use spinlock to access dmabuf->name to avoid sleep-while-atomic. Cc: <stable@vger.kernel.org> [5.3+] Signed-off-by: Charan Teja Kalla <charante@codeaurora.org> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Acked-by: Christian König <christian.koenig@amd.com> [sumits: added comment to spinlock_t definition to avoid warning] Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/a83e7f0d-4e54-9848-4b58-e1acdbe06735@codeaurora.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-22Input: elan_i2c - add more hardware ID for Lenovo laptopsDave Wang
commit a50ca29523b18baea548bdf5df9b4b923c2bb4f6 upstream. This adds more hardware IDs for Elan touchpads found in various Lenovo laptops. Signed-off-by: Dave Wang <dave.wang@emc.com.tw> Link: https://lore.kernel.org/r/000201d5a8bd$9fead3f0$dfc07bd0$@emc.com.tw Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22bus: ti-sysc: Handle module unlock quirk needed for some RTCTony Lindgren
[ Upstream commit e8639e1c986a8a9d0f94549170f6db579376c3ae ] The RTC modules on am3 and am4 need quirk handling to unlock and lock them for reset so let's add the quirk handling based on what we already have for legacy platform data. In later patches we will simply drop the RTC related platform data and the old quirk handling. Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-22blk-mq-debugfs: update blk_queue_flag_name[] accordingly for new flagsHou Tao
[ Upstream commit bfe373f608cf81b7626dfeb904001b0e867c5110 ] Else there may be magic numbers in /sys/kernel/debug/block/*/state. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-07-22cgroup: Fix sock_cgroup_data on big-endian.Cong Wang
[ Upstream commit 14b032b8f8fce03a546dcf365454bec8c4a58d7d ] In order for no_refcnt and is_data to be the lowest order two bits in the 'val' we have to pad out the bitfield of the u8. Fixes: ad0f75e5f57c ("cgroup: fix cgroup_sk_alloc() for sk_clone_lock()") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22cgroup: fix cgroup_sk_alloc() for sk_clone_lock()Cong Wang
[ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ] When we clone a socket in sk_clone_lock(), its sk_cgrp_data is copied, so the cgroup refcnt must be taken too. And, unlike the sk_alloc() path, sock_update_netprioidx() is not called here. Therefore, it is safe and necessary to grab the cgroup refcnt even when cgroup_sk_alloc is disabled. sk_clone_lock() is in BH context anyway, the in_interrupt() would terminate this function if called there. And for sk_alloc() skcd->val is always zero. So it's safe to factor out the code to make it more readable. The global variable 'cgroup_sk_alloc_disabled' is used to determine whether to take these reference counts. It is impossible to make the reference counting correct unless we save this bit of information in skcd->val. So, add a new bit there to record whether the socket has already taken the reference counts. This obviously relies on kmalloc() to align cgroup pointers to at least 4 bytes, ARCH_KMALLOC_MINALIGN is certainly larger than that. This bug seems to be introduced since the beginning, commit d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets") tried to fix it but not compeletely. It seems not easy to trigger until the recent commit 090e28b229af ("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged. Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") Reported-by: Cameron Berkenpas <cam@neo-zeon.de> Reported-by: Peter Geis <pgwipeout@gmail.com> Reported-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reported-by: Daniël Sonck <dsonck92@gmail.com> Reported-by: Zhang Qiang <qiang.zhang@windriver.com> Tested-by: Cameron Berkenpas <cam@neo-zeon.de> Tested-by: Peter Geis <pgwipeout@gmail.com> Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Zefan Li <lizefan@huawei.com> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <guro@fb.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22vlan: consolidate VLAN parsing code and limit max parsing depthToke Høiland-Jørgensen
[ Upstream commit 469aceddfa3ed16e17ee30533fae45e90f62efd8 ] Toshiaki pointed out that we now have two very similar functions to extract the L3 protocol number in the presence of VLAN tags. And Daniel pointed out that the unbounded parsing loop makes it possible for maliciously crafted packets to loop through potentially hundreds of tags. Fix both of these issues by consolidating the two parsing functions and limiting the VLAN tag parsing to a max depth of 8 tags. As part of this, switch over __vlan_get_protocol() to use skb_header_pointer() instead of pskb_may_pull(), to avoid the possible side effects of the latter and keep the skb pointer 'const' through all the parsing functions. v2: - Use limit of 8 tags instead of 32 (matching XMIT_RECURSION_LIMIT) Reported-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Reported-by: Daniel Borkmann <daniel@iogearbox.net> Fixes: d7bf2ebebc2b ("sched: consistently handle layer3 header accesses in the presence of VLANs") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-22sched: consistently handle layer3 header accesses in the presence of VLANsToke Høiland-Jørgensen
[ Upstream commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4 ] There are a couple of places in net/sched/ that check skb->protocol and act on the value there. However, in the presence of VLAN tags, the value stored in skb->protocol can be inconsistent based on whether VLAN acceleration is enabled. The commit quoted in the Fixes tag below fixed the users of skb->protocol to use a helper that will always see the VLAN ethertype. However, most of the callers don't actually handle the VLAN ethertype, but expect to find the IP header type in the protocol field. This means that things like changing the ECN field, or parsing diffserv values, stops working if there's a VLAN tag, or if there are multiple nested VLAN tags (QinQ). To fix this, change the helper to take an argument that indicates whether the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we make sure to skip all of them, so behaviour is consistent even in QinQ mode. To make the helper usable from the ECN code, move it to if_vlan.h instead of pkt_sched.h. v3: - Remove empty lines - Move vlan variable definitions inside loop in skb_protocol() - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and bpf_skb_ecn_set_ce() v2: - Use eth_type_vlan() helper in skb_protocol() - Also fix code that reads skb->protocol directly - Change a couple of 'if/else if' statements to switch constructs to avoid calling the helper twice Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com> Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-16bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok()Kees Cook
commit 63960260457a02af2a6cb35d75e6bdb17299c882 upstream. When evaluating access control over kallsyms visibility, credentials at open() time need to be used, not the "current" creds (though in BPF's case, this has likely always been the same). Plumb access to associated file->f_cred down through bpf_dump_raw_ok() and its callers now that kallsysm_show_value() has been refactored to take struct cred. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: bpf@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-07-16kallsyms: Refactor kallsyms_show_value() to take credKees Cook
commit 160251842cd35a75edfb0a1d76afa3eb674ff40a upstream. In order to perform future tests against the cred saved during open(), switch kallsyms_show_value() to operate on a cred, and have all current callers pass current_cred(). This makes it very obvious where callers are checking the wrong credential in their "read" contexts. These will be fixed in the coming patches. Additionally switch return value to bool, since it is always used as a direct permission check, not a 0-on-success, negative-on-error style function return. Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30net: qed: fix left elements count calculationAlexander Lobakin
[ Upstream commit 97dd1abd026ae4e6a82fa68645928404ad483409 ] qed_chain_get_element_left{,_u32} returned 0 when the difference between producer and consumer page count was equal to the total page count. Fix this by conditional expanding of producer value (vs unconditional). This allowed to eliminate normalizaton against total page count, which was the cause of this bug. Misc: replace open-coded constants with common defines. Fixes: a91eb52abb50 ("qed: Revisit chain implementation") Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-30efi/tpm: Verify event log header before parsingFabian Vogt
[ Upstream commit 7dfc06a0f25b593a9f51992f540c0f80a57f3629 ] It is possible that the first event in the event log is not actually a log header at all, but rather a normal event. This leads to the cast in __calc_tpm2_event_size being an invalid conversion, which means that the values read are effectively garbage. Depending on the first event's contents, this leads either to apparently normal behaviour, a crash or a freeze. While this behaviour of the firmware is not in accordance with the TCG Client EFI Specification, this happens on a Dell Precision 5510 with the TPM enabled but hidden from the OS ("TPM On" disabled, state otherwise untouched). The EFI firmware claims that the TPM is present and active and that it supports the TCG 2.0 event log format. Fortunately, this can be worked around by simply checking the header of the first event and the event log header signature itself. Commit b4f1874c6216 ("tpm: check event log version before reading final events") addressed a similar issue also found on Dell models. Fixes: 6b0326190205 ("efi: Attempt to get the TCG2 event log in the boot stub") Signed-off-by: Fabian Vogt <fvogt@suse.de> Link: https://lore.kernel.org/r/1927248.evlx2EsYKh@linux-e202.suse.de Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1165773 Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-30net: core: reduce recursion limit valueTaehee Yoo
[ Upstream commit fb7861d14c8d7edac65b2fcb6e8031cb138457b2 ] In the current code, ->ndo_start_xmit() can be executed recursively only 10 times because of stack memory. But, in the case of the vxlan, 10 recursion limit value results in a stack overflow. In the current code, the nested interface is limited by 8 depth. There is no critical reason that the recursion limitation value should be 10. So, it would be good to be the same value with the limitation value of nesting interface depth. Test commands: ip link add vxlan10 type vxlan vni 10 dstport 4789 srcport 4789 4789 ip link set vxlan10 up ip a a 192.168.10.1/24 dev vxlan10 ip n a 192.168.10.2 dev vxlan10 lladdr fc:22:33:44:55:66 nud permanent for i in {9..0} do let A=$i+1 ip link add vxlan$i type vxlan vni $i dstport 4789 srcport 4789 4789 ip link set vxlan$i up ip a a 192.168.$i.1/24 dev vxlan$i ip n a 192.168.$i.2 dev vxlan$i lladdr fc:22:33:44:55:66 nud permanent bridge fdb add fc:22:33:44:55:66 dev vxlan$A dst 192.168.$i.2 self done hping3 192.168.10.2 -2 -d 60000 Splat looks like: [ 103.814237][ T1127] ============================================================================= [ 103.871955][ T1127] BUG kmalloc-2k (Tainted: G B ): Padding overwritten. 0x00000000897a2e4f-0x000 [ 103.873187][ T1127] ----------------------------------------------------------------------------- [ 103.873187][ T1127] [ 103.874252][ T1127] INFO: Slab 0x000000005cccc724 objects=5 used=5 fp=0x0000000000000000 flags=0x10000000001020 [ 103.881323][ T1127] CPU: 3 PID: 1127 Comm: hping3 Tainted: G B 5.7.0+ #575 [ 103.882131][ T1127] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 103.883006][ T1127] Call Trace: [ 103.883324][ T1127] dump_stack+0x96/0xdb [ 103.883716][ T1127] slab_err+0xad/0xd0 [ 103.884106][ T1127] ? _raw_spin_unlock+0x1f/0x30 [ 103.884620][ T1127] ? get_partial_node.isra.78+0x140/0x360 [ 103.885214][ T1127] slab_pad_check.part.53+0xf7/0x160 [ 103.885769][ T1127] ? pskb_expand_head+0x110/0xe10 [ 103.886316][ T1127] check_slab+0x97/0xb0 [ 103.886763][ T1127] alloc_debug_processing+0x84/0x1a0 [ 103.887308][ T1127] ___slab_alloc+0x5a5/0x630 [ 103.887765][ T1127] ? pskb_expand_head+0x110/0xe10 [ 103.888265][ T1127] ? lock_downgrade+0x730/0x730 [ 103.888762][ T1127] ? pskb_expand_head+0x110/0xe10 [ 103.889244][ T1127] ? __slab_alloc+0x3e/0x80 [ 103.889675][ T1127] __slab_alloc+0x3e/0x80 [ 103.890108][ T1127] __kmalloc_node_track_caller+0xc7/0x420 [ ... ] Fixes: 11a766ce915f ("net: Increase xmit RECURSION_LIMIT to 10.") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24kretprobe: Prevent triggering kretprobe from within kprobe_flush_taskJiri Olsa
commit 9b38cc704e844e41d9cf74e647bff1d249512cb3 upstream. Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave. My test was also able to trigger lockdep output: ============================================ WARNING: possible recursive locking detected 5.6.0-rc6+ #6 Not tainted -------------------------------------------- sched-messaging/2767 is trying to acquire lock: ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0 but task is already holding lock: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(kretprobe_table_locks[i].lock)); lock(&(kretprobe_table_locks[i].lock)); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by sched-messaging/2767: #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50 stack backtrace: CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6 Call Trace: dump_stack+0x96/0xe0 __lock_acquire.cold.57+0x173/0x2b7 ? native_queued_spin_lock_slowpath+0x42b/0x9e0 ? lockdep_hardirqs_on+0x590/0x590 ? __lock_acquire+0xf63/0x4030 lock_acquire+0x15a/0x3d0 ? kretprobe_hash_lock+0x52/0xa0 _raw_spin_lock_irqsave+0x36/0x70 ? kretprobe_hash_lock+0x52/0xa0 kretprobe_hash_lock+0x52/0xa0 trampoline_handler+0xf8/0x940 ? kprobe_fault_handler+0x380/0x380 ? find_held_lock+0x3a/0x1c0 kretprobe_trampoline+0x25/0x50 ? lock_acquired+0x392/0xbc0 ? _raw_spin_lock_irqsave+0x50/0x70 ? __get_valid_kprobe+0x1f0/0x1f0 ? _raw_spin_unlock_irqrestore+0x3b/0x40 ? finish_task_switch+0x4b9/0x6d0 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 The code within the kretprobe handler checks for probe reentrancy, so we won't trigger any _raw_spin_lock_irqsave probe in there. The problem is in outside kprobe_flush_task, where we call: kprobe_flush_task kretprobe_table_lock raw_spin_lock_irqsave _raw_spin_lock_irqsave where _raw_spin_lock_irqsave triggers the kretprobe and installs kretprobe_trampoline handler on _raw_spin_lock_irqsave return. The kretprobe_trampoline handler is then executed with already locked kretprobe_table_locks, and first thing it does is to lock kretprobe_table_locks ;-) the whole lockup path like: kprobe_flush_task kretprobe_table_lock raw_spin_lock_irqsave _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed ---> kretprobe_table_locks locked kretprobe_trampoline trampoline_handler kretprobe_hash_lock(current, &head, &flags); <--- deadlock Adding kprobe_busy_begin/end helpers that mark code with fake probe installed to prevent triggering of another kprobe within this code. Using these helpers in kprobe_flush_task, so the probe recursion protection check is hit and the probe is never set to prevent above lockup. Link: http://lkml.kernel.org/r/158927059835.27680.7011202830041561604.stgit@devnote2 Fixes: ef53d9c5e4da ("kprobes: improve kretprobe scalability with hashed locking") Cc: Ingo Molnar <mingo@kernel.org> Cc: "Gustavo A . R . Silva" <gustavoars@kernel.org> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Reported-by: "Ziqian SUN (Zamir)" <zsun@redhat.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-24block: nr_sects_write(): Disable preemption on seqcount writeAhmed S. Darwish
[ Upstream commit 15b81ce5abdc4b502aa31dff2d415b79d2349d2f ] For optimized block readers not holding a mutex, the "number of sectors" 64-bit value is protected from tearing on 32-bit architectures by a sequence counter. Disable preemption before entering that sequence counter's write side critical section. Otherwise, the read side can preempt the write side section and spin for the entire scheduler tick. If the reader belongs to a real-time scheduling class, it can spin forever and the kernel will livelock. Fixes: c83f6bf98dc1 ("block: add partition resize function to blkpg ioctl") Cc: <stable@vger.kernel.org> Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24jbd2: clean __jbd2_journal_abort_hard() and __journal_abort_soft()zhangyi (F)
[ Upstream commit 7f6225e446cc8dfa4c3c7959a4de3dd03ec277bf ] __jbd2_journal_abort_hard() is no longer used, so now we can merge __jbd2_journal_abort_hard() and __journal_abort_soft() these two functions into jbd2_journal_abort() and remove them. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191204124614.45424-5-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24libata: Use per port sync for detachKai-Heng Feng
[ Upstream commit b5292111de9bb70cba3489075970889765302136 ] Commit 130f4caf145c ("libata: Ensure ata_port probe has completed before detach") may cause system freeze during suspend. Using async_synchronize_full() in PM callbacks is wrong, since async callbacks that are already scheduled may wait for not-yet-scheduled callbacks, causes a circular dependency. Instead of using big hammer like async_synchronize_full(), use async cookie to make sure port probe are synced, without affecting other scheduled PM callbacks. Fixes: 130f4caf145c ("libata: Ensure ata_port probe has completed before detach") Suggested-by: John Garry <john.garry@huawei.com> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: John Garry <john.garry@huawei.com> BugLink: https://bugs.launchpad.net/bugs/1867983 Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24usb: host: ehci-platform: add a quirk to avoid stuckYoshihiro Shimoda
[ Upstream commit cc7eac1e4afdd151085be4d0341a155760388653 ] Since EHCI/OHCI controllers on R-Car Gen3 SoCs are possible to be getting stuck very rarely after a full/low usb device was disconnected. To detect/recover from such a situation, the controllers require a special way which poll the EHCI PORTSC register and changes the OHCI functional state. So, this patch adds a polling timer into the ehci-platform driver, and if the ehci driver detects the issue by the EHCI PORTSC register, the ehci driver removes a companion device (= the OHCI controller) to change the OHCI functional state to USB Reset once. And then, the ehci driver adds the companion device again. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/1580114262-25029-1-git-send-email-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24nfs: set invalid blocks after NFSv4 writesZheng Bin
[ Upstream commit 3a39e778690500066b31fe982d18e2e394d3bce2 ] Use the following command to test nfsv4(size of file1M is 1MB): mount -t nfs -o vers=4.0,actimeo=60 127.0.0.1/dir1 /mnt cp file1M /mnt du -h /mnt/file1M -->0 within 60s, then 1M When write is done(cp file1M /mnt), will call this: nfs_writeback_done nfs4_write_done nfs4_write_done_cb nfs_writeback_update_inode nfs_post_op_update_inode_force_wcc_locked(change, ctime, mtime nfs_post_op_update_inode_force_wcc_locked nfs_set_cache_invalid nfs_refresh_inode_locked nfs_update_inode nfsd write response contains change, ctime, mtime, the flag will be clear after nfs_update_inode. Howerver, write response does not contain space_used, previous open response contains space_used whose value is 0, so inode->i_blocks is still 0. nfs_getattr -->called by "du -h" do_update |= force_sync || nfs_attribute_cache_expired -->false in 60s cache_validity = READ_ONCE(NFS_I(inode)->cache_validity) do_update |= cache_validity & (NFS_INO_INVALID_ATTR -->false if (do_update) { __nfs_revalidate_inode } Within 60s, does not send getattr request to nfsd, thus "du -h /mnt/file1M" is 0. Add a NFS_INO_INVALID_BLOCKS flag, set it when nfsv4 write is done. Fixes: 16e143751727 ("NFS: More fine grained attribute tracking") Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24include/linux/bitops.h: avoid clang shift-count-overflow warningsArnd Bergmann
[ Upstream commit bd93f003b7462ae39a43c531abca37fe7073b866 ] Clang normally does not warn about certain issues in inline functions when it only happens in an eliminated code path. However if something else goes wrong, it does tend to complain about the definition of hweight_long() on 32-bit targets: include/linux/bitops.h:75:41: error: shift count >= width of type [-Werror,-Wshift-count-overflow] return sizeof(w) == 4 ? hweight32(w) : hweight64(w); ^~~~~~~~~~~~ include/asm-generic/bitops/const_hweight.h:29:49: note: expanded from macro 'hweight64' define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w) : __arch_hweight64(w)) ^~~~~~~~~~~~~~~~~~~~ include/asm-generic/bitops/const_hweight.h:21:76: note: expanded from macro '__const_hweight64' define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32)) ^ ~~ include/asm-generic/bitops/const_hweight.h:20:49: note: expanded from macro '__const_hweight32' define __const_hweight32(w) (__const_hweight16(w) + __const_hweight16((w) >> 16)) ^ include/asm-generic/bitops/const_hweight.h:19:72: note: expanded from macro '__const_hweight16' define __const_hweight16(w) (__const_hweight8(w) + __const_hweight8((w) >> 8 )) ^ include/asm-generic/bitops/const_hweight.h:12:9: note: expanded from macro '__const_hweight8' (!!((w) & (1ULL << 2))) + \ Adding an explicit cast to __u64 avoids that warning and makes it easier to read other output. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Link: http://lkml.kernel.org/r/20200505135513.65265-1-arnd@arndb.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24/dev/mem: Revoke mappings when a driver claims the regionDan Williams
[ Upstream commit 3234ac664a870e6ea69ae3a57d824cd7edbeacc5 ] Close the hole of holding a mapping over kernel driver takeover event of a given address range. Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges") introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the kernel against scenarios where a /dev/mem user tramples memory that a kernel driver owns. However, this protection only prevents *new* read(), write() and mmap() requests. Established mappings prior to the driver calling request_mem_region() are left alone. Especially with persistent memory, and the core kernel metadata that is stored there, there are plentiful scenarios for a /dev/mem user to violate the expectations of the driver and cause amplified damage. Teach request_mem_region() to find and shoot down active /dev/mem mappings that it believes it has successfully claimed for the exclusive use of the driver. Effectively a driver call to request_mem_region() becomes a hole-punch on the /dev/mem device. The typical usage of unmap_mapping_range() is part of truncate_pagecache() to punch a hole in a file, but in this case the implementation is only doing the "first half" of a hole punch. Namely it is just evacuating current established mappings of the "hole", and it relies on the fact that /dev/mem establishes mappings in terms of absolute physical address offsets. Once existing mmap users are invalidated they can attempt to re-establish the mapping, or attempt to continue issuing read(2) / write(2) to the invalidated extent, but they will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can block those subsequent accesses. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: 90a545e98126 ("restrict /dev/mem to idle io memory ranges") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/159009507306.847224.8502634072429766747.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24mfd: stmfx: Disable IRQ in suspend to avoid spurious interruptAmelie Delaunay
[ Upstream commit 97eda5dcc2cde5dcc778bef7a9344db3b6bf8ef5 ] When STMFX supply is stopped, spurious interrupt can occur. To avoid that, disable the interrupt in suspend before disabling the regulator and re-enable it at the end of resume. Fixes: 06252ade9156 ("mfd: Add ST Multi-Function eXpander (STMFX) core driver") Signed-off-by: Amelie Delaunay <amelie.delaunay@st.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24usb: gadget: Fix issue with config_ep_by_speed functionPawel Laszczak
[ Upstream commit 5d363120aa548ba52d58907a295eee25f8207ed2 ] This patch adds new config_ep_by_speed_and_alt function which extends the config_ep_by_speed about alt parameter. This additional parameter allows to find proper usb_ss_ep_comp_descriptor. Problem has appeared during testing f_tcm (BOT/UAS) driver function. f_tcm function for SS use array of headers for both BOT/UAS alternate setting: static struct usb_descriptor_header *uasp_ss_function_desc[] = { (struct usb_descriptor_header *) &bot_intf_desc, (struct usb_descriptor_header *) &uasp_ss_bi_desc, (struct usb_descriptor_header *) &bot_bi_ep_comp_desc, (struct usb_descriptor_header *) &uasp_ss_bo_desc, (struct usb_descriptor_header *) &bot_bo_ep_comp_desc, (struct usb_descriptor_header *) &uasp_intf_desc, (struct usb_descriptor_header *) &uasp_ss_bi_desc, (struct usb_descriptor_header *) &uasp_bi_ep_comp_desc, (struct usb_descriptor_header *) &uasp_bi_pipe_desc, (struct usb_descriptor_header *) &uasp_ss_bo_desc, (struct usb_descriptor_header *) &uasp_bo_ep_comp_desc, (struct usb_descriptor_header *) &uasp_bo_pipe_desc, (struct usb_descriptor_header *) &uasp_ss_status_desc, (struct usb_descriptor_header *) &uasp_status_in_ep_comp_desc, (struct usb_descriptor_header *) &uasp_status_pipe_desc, (struct usb_descriptor_header *) &uasp_ss_cmd_desc, (struct usb_descriptor_header *) &uasp_cmd_comp_desc, (struct usb_descriptor_header *) &uasp_cmd_pipe_desc, NULL, }; The first 5 descriptors are associated with BOT alternate setting, and others are associated with UAS. During handling UAS alternate setting f_tcm driver invokes config_ep_by_speed and this function sets incorrect companion endpoint descriptor in usb_ep object. Instead setting ep->comp_desc to uasp_bi_ep_comp_desc function in this case set ep->comp_desc to uasp_ss_bi_desc. This is due to the fact that it searches endpoint based on endpoint address: for_each_ep_desc(speed_desc, d_spd) { chosen_desc = (struct usb_endpoint_descriptor *)*d_spd; if (chosen_desc->bEndpoitAddress == _ep->address) goto ep_found; } And in result it uses the descriptor from BOT alternate setting instead UAS. Finally, it causes that controller driver during enabling endpoints detect that just enabled endpoint for bot. Signed-off-by: Jayshri Pawar <jpawar@cadence.com> Signed-off-by: Pawel Laszczak <pawell@cadence.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24usb: gadget: core: sync interrupt before unbind the udcPeter Chen
[ Upstream commit 3c73bc52195def14165c3a7d91bdbb33b51725f5 ] The threaded interrupt handler may still be called after the usb_gadget_disconnect is called, it causes the structures used at interrupt handler was freed before it uses, eg the usb_request. This issue usually occurs we remove the udc function during the transfer. Below is the example when doing stress test for android switch function, the EP0's request is freed by .unbind (configfs_composite_unbind -> composite_dev_cleanup), but the threaded handler accesses this request during handling setup packet request. In fact, there is no protection between unbind the udc and udc interrupt handling, so we have to avoid the interrupt handler is occurred or scheduled during the .unbind flow. init: Sending signal 9 to service 'adbd' (pid 18077) process group... android_work: did not send uevent (0 0 000000007bec2039) libprocessgroup: Successfully killed process cgroup uid 0 pid 18077 in 6ms init: Service 'adbd' (pid 18077) received signal 9 init: Sending signal 9 to service 'adbd' (pid 18077) process group... libprocessgroup: Successfully killed process cgroup uid 0 pid 18077 in 0ms init: processing action (init.svc.adbd=stopped) from (/init.usb.configfs.rc:14) init: Received control message 'start' for 'adbd' from pid: 399 (/vendor/bin/hw/android.hardware.usb@1. init: starting service 'adbd'... read descriptors read strings Unable to handle kernel read from unreadable memory at virtual address 000000000000002a android_work: sent uevent USB_STATE=CONNECTED Mem abort info: ESR = 0x96000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e97f1000 using random self ethernet address [000000000000002a] pgd=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 232 Comm: irq/68-5b110000 Not tainted 5.4.24-06075-g94a6b52b5815 #92 Hardware name: Freescale i.MX8QXP MEK (DT) pstate: 00400085 (nzcv daIf +PAN -UAO) using random host ethernet address pc : composite_setup+0x5c/0x1730 lr : android_setup+0xc0/0x148 sp : ffff80001349bba0 x29: ffff80001349bba0 x28: ffff00083a50da00 x27: ffff8000124e6000 x26: ffff800010177950 x25: 0000000000000040 x24: ffff000834e18010 x23: 0000000000000000 x22: 0000000000000000 x21: ffff00083a50da00 x20: ffff00082e75ec40 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000001 x11: ffff80001180fb58 x10: 0000000000000040 x9 : ffff8000120fc980 x8 : 0000000000000000 x7 : ffff00083f98df50 x6 : 0000000000000100 x5 : 00000307e8978431 x4 : ffff800011386788 x3 : 0000000000000000 x2 : ffff800012342000 x1 : 0000000000000000 x0 : ffff800010c6d3a0 Call trace: composite_setup+0x5c/0x1730 android_setup+0xc0/0x148 cdns3_ep0_delegate_req+0x64/0x90 cdns3_check_ep0_interrupt_proceed+0x384/0x738 cdns3_device_thread_irq_handler+0x124/0x6e0 cdns3_thread_irq+0x94/0xa0 irq_thread_fn+0x30/0xa0 irq_thread+0x150/0x248 kthread+0xfc/0x128 ret_from_fork+0x10/0x18 Code: 910e8000 f9400693 12001ed7 79400f79 (3940aa61) ---[ end trace c685db37f8773fba ]--- Kernel panic - not syncing: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x0002,20002008 Memory Limit: none Rebooting in 5 seconds.. Reviewed-by: Jun Li <jun.li@nxp.com> Signed-off-by: Peter Chen <peter.chen@nxp.com> Signed-off-by: Felipe Balbi <balbi@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22sunrpc: clean up properly in gss_mech_unregister()NeilBrown
commit 24c5efe41c29ee3e55bcf5a1c9f61ca8709622e8 upstream. gss_mech_register() calls svcauth_gss_register_pseudoflavor() for each flavour, but gss_mech_unregister() does not call auth_domain_put(). This is unbalanced and makes it impossible to reload the module. Change svcauth_gss_register_pseudoflavor() to return the registered auth_domain, and save it for later release. Cc: stable@vger.kernel.org (v2.6.12+) Link: https://bugzilla.kernel.org/show_bug.cgi?id=206651 Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-22x86/amd_nb: Add AMD family 17h model 60h PCI IDsAlexander Monakov
[ Upstream commit a4e91825d7e1252f7cba005f1451e5464b23c15d ] Add PCI IDs for AMD Renoir (4000-series Ryzen CPUs). This is necessary to enable support for temperature sensors via the k10temp module. Signed-off-by: Alexander Monakov <amonakov@ispras.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Link: https://lkml.kernel.org/r/20200510204842.2603-2-amonakov@ispras.ru Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22serial: 8250_pci: Move Pericom IDs to pci_ids.hKai-Heng Feng
[ Upstream commit 62a7f3009a460001eb46984395280dd900bc4ef4 ] Move the IDs to pci_ids.h so it can be used by next patch. Link: https://lore.kernel.org/r/20200508065343.32751-1-kai.heng.feng@canonical.com Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22PCI: Add Loongson vendor IDTiezhu Yang
[ Upstream commit 9acb9fe18d863aacc99948963f8d5d447dc311be ] Add the Loongson vendor ID to pci_ids.h to be used by the controller driver in the future. The Loongson vendor ID can be found at the following link: https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/tree/pci.ids Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22x86/amd_nb: Add Family 19h PCI IDsYazen Ghannam
[ Upstream commit b3f79ae45904ae987a7c06a9e8d6084d7b73e67f ] Add the new PCI Device 18h IDs for AMD Family 19h systems. Note that Family 19h systems will not have a new PCI root device ID. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200110015651.14887-4-Yazen.Ghannam@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-22PCI: vmd: Add device id for VMD device 8086:9A0BJon Derrick
[ Upstream commit ec11e5c213cc20cac5e8310728b06793448b9f6d ] This patch adds support for this VMD device which supports the bus restriction mode. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>