summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2020-02-14RDMA/uverbs: Verify MR access flagsMichael Guralnik
commit ca95c1411198c2d87217c19d44571052cdc94725 upstream. Verify that MR access flags that are passed from user are all supported ones, otherwise an error is returned. Fixes: 4fca03778351 ("IB/uverbs: Move ib_access_flags and ib_read_counters_flags to uapi") Link: https://lore.kernel.org/r/1578506740-22188-6-git-send-email-yishaih@mellanox.com Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-14IB/mlx5: Return the administrative GUID if existsDanit Goldberg
commit 4bbd4923d1f5627b0c47a9d7dfb5cc91224cfe0c upstream. A user can change the operational GUID (a.k.a affective GUID) through link/infiniband. Therefore it is preferred to return the currently set GUID if it exists instead of the operational. This way the PF can query which VF GUID will be set in the next bind. In order to align with MAC address, zero is returned if administrative GUID is not set. For example, before setting administrative GUID: $ ip link show ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 00:00:00:00:00:00:00:00, PORT_GUID 00:00:00:00:00:00:00:00, link-state auto, trust off, query_rss off Then: $ ip link set ib0 vf 0 node_guid 11:00:af:21:cb:05:11:00 $ ip link set ib0 vf 0 port_guid 22:11:af:21:cb:05:11:00 After setting administrative GUID: $ ip link show ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 11:00:af:21:cb:05:11:00, PORT_GUID 22:11:af:21:cb:05:11:00, link-state auto, trust off, query_rss off Fixes: 9c0015ef0928 ("IB/mlx5: Implement callbacks for getting VFs GUID attributes") Link: https://lore.kernel.org/r/20200116120048.12744-1-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11regulator fix for "regulator: core: Add regulator_is_equal() helper"Stephen Rothwell
[ Upstream commit 0468e667a5bead9c1b7ded92861b5a98d8d78745 ] Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20200115120258.0e535fcb@canb.auug.org.au Acked-by: Marek Vasut <marex@denx.de> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11KVM: Use vcpu-specific gva->hva translation when querying host page sizeSean Christopherson
[ Upstream commit f9b84e19221efc5f493156ee0329df3142085f28 ] Use kvm_vcpu_gfn_to_hva() when retrieving the host page size so that the correct set of memslots is used when handling x86 page faults in SMM. Fixes: 54bf36aac520 ("KVM: x86: use vcpu-specific functions to read/write/translate GFNs") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11io_uring: enable option to only trigger eventfd for async completionsSasha Levin
[ Upstream commit f2842ab5b72d7ee5f7f8385c2d4f32c133f5837b ] If an application is using eventfd notifications with poll to know when new SQEs can be issued, it's expecting the following read/writes to complete inline. And with that, it knows that there are events available, and don't want spurious wakeups on the eventfd for those requests. This adds IORING_REGISTER_EVENTFD_ASYNC, which works just like IORING_REGISTER_EVENTFD, except it only triggers notifications for events that happen from async completions (IRQ, or io-wq worker completions). Any completions inline from the submission itself will not trigger notifications. Suggested-by: Mark Papadakis <markuspapadakis@icloud.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-11x86/apic/msi: Plug non-maskable MSI affinity raceThomas Gleixner
commit 6f1a4891a5928a5969c87fa5a584844c983ec823 upstream. Evan tracked down a subtle race between the update of the MSI message and the device raising an interrupt internally on PCI devices which do not support MSI masking. The update of the MSI message is non-atomic and consists of either 2 or 3 sequential 32bit wide writes to the PCI config space. - Write address low 32bits - Write address high 32bits (If supported by device) - Write data When an interrupt is migrated then both address and data might change, so the kernel attempts to mask the MSI interrupt first. But for MSI masking is optional, so there exist devices which do not provide it. That means that if the device raises an interrupt internally between the writes then a MSI message is sent built from half updated state. On x86 this can lead to spurious interrupts on the wrong interrupt vector when the affinity setting changes both address and data. As a consequence the device interrupt can be lost causing the device to become stuck or malfunctioning. Evan tried to handle that by disabling MSI accross an MSI message update. That's not feasible because disabling MSI has issues on its own: If MSI is disabled the PCI device is routing an interrupt to the legacy INTx mechanism. The INTx delivery can be disabled, but the disablement is not working on all devices. Some devices lose interrupts when both MSI and INTx delivery are disabled. Another way to solve this would be to enforce the allocation of the same vector on all CPUs in the system for this kind of screwed devices. That could be done, but it would bring back the vector space exhaustion problems which got solved a few years ago. Fortunately the high address (if supported by the device) is only relevant when X2APIC is enabled which implies interrupt remapping. In the interrupt remapping case the affinity setting is happening at the interrupt remapping unit and the PCI MSI message is programmed only once when the PCI device is initialized. That makes it possible to solve it with a two step update: 1) Target the MSI msg to the new vector on the current target CPU 2) Target the MSI msg to the new vector on the new target CPU In both cases writing the MSI message is only changing a single 32bit word which prevents the issue of inconsistency. After writing the final destination it is necessary to check whether the device issued an interrupt while the intermediate state #1 (new vector, current CPU) was in effect. This is possible because the affinity change is always happening on the current target CPU. The code runs with interrupts disabled, so the interrupt can be detected by checking the IRR of the local APIC. If the vector is pending in the IRR then the interrupt is retriggered on the new target CPU by sending an IPI for the associated vector on the target CPU. This can cause spurious interrupts on both the local and the new target CPU. 1) If the new vector is not in use on the local CPU and the device affected by the affinity change raised an interrupt during the transitional state (step #1 above) then interrupt entry code will ignore that spurious interrupt. The vector is marked so that the 'No irq handler for vector' warning is supressed once. 2) If the new vector is in use already on the local CPU then the IRR check might see an pending interrupt from the device which is using this vector. The IPI to the new target CPU will then invoke the handler of the device, which got the affinity change, even if that device did not issue an interrupt 3) If the new vector is in use already on the local CPU and the device affected by the affinity change raised an interrupt during the transitional state (step #1 above) then the handler of the device which uses that vector on the local CPU will be invoked. expose issues in device driver interrupt handlers which are not prepared to handle a spurious interrupt correctly. This not a regression, it's just exposing something which was already broken as spurious interrupts can happen for a lot of reasons and all driver handlers need to be able to deal with them. Reported-by: Evan Green <evgreen@chromium.org> Debugged-by: Evan Green <evgreen@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Evan Green <evgreen@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11net/mlx5: Deprecate usage of generic TLS HW capability bitTariq Toukan
[ Upstream commit 61c00cca41aeeaa8e5263c2f81f28534bc1efafb ] Deprecate the generic TLS cap bit, use the new TX-specific TLS cap bit instead. Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11bonding/alb: properly access headers in bond_alb_xmit()Eric Dumazet
[ Upstream commit 38f88c45404293bbc027b956def6c10cbd45c616 ] syzbot managed to send an IPX packet through bond_alb_xmit() and af_packet and triggered a use-after-free. First, bond_alb_xmit() was using ipx_hdr() helper to reach the IPX header, but ipx_hdr() was using the transport offset instead of the network offset. In the particular syzbot report transport offset was 0xFFFF This patch removes ipx_hdr() since it was only (mis)used from bonding. Then we need to make sure IPv4/IPv6/IPX headers are pulled in skb->head before dereferencing anything. BUG: KASAN: use-after-free in bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452 Read of size 2 at addr ffff8801ce56dfff by task syz-executor.2/18108 (if (ipx_hdr(skb)->ipx_checksum != IPX_NO_CHECKSUM) ...) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: [<ffffffff8441fc42>] __dump_stack lib/dump_stack.c:17 [inline] [<ffffffff8441fc42>] dump_stack+0x14d/0x20b lib/dump_stack.c:53 [<ffffffff81a7dec4>] print_address_description+0x6f/0x20b mm/kasan/report.c:282 [<ffffffff81a7e0ec>] kasan_report_error mm/kasan/report.c:380 [inline] [<ffffffff81a7e0ec>] kasan_report mm/kasan/report.c:438 [inline] [<ffffffff81a7e0ec>] kasan_report.cold+0x8c/0x2a0 mm/kasan/report.c:422 [<ffffffff81a7dc4f>] __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:469 [<ffffffff82c8c00a>] bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452 [<ffffffff82c60c74>] __bond_start_xmit drivers/net/bonding/bond_main.c:4199 [inline] [<ffffffff82c60c74>] bond_start_xmit+0x4f4/0x1570 drivers/net/bonding/bond_main.c:4224 [<ffffffff83baa558>] __netdev_start_xmit include/linux/netdevice.h:4525 [inline] [<ffffffff83baa558>] netdev_start_xmit include/linux/netdevice.h:4539 [inline] [<ffffffff83baa558>] xmit_one net/core/dev.c:3611 [inline] [<ffffffff83baa558>] dev_hard_start_xmit+0x168/0x910 net/core/dev.c:3627 [<ffffffff83bacf35>] __dev_queue_xmit+0x1f55/0x33b0 net/core/dev.c:4238 [<ffffffff83bae3a8>] dev_queue_xmit+0x18/0x20 net/core/dev.c:4278 [<ffffffff84339189>] packet_snd net/packet/af_packet.c:3226 [inline] [<ffffffff84339189>] packet_sendmsg+0x4919/0x70b0 net/packet/af_packet.c:3252 [<ffffffff83b1ac0c>] sock_sendmsg_nosec net/socket.c:673 [inline] [<ffffffff83b1ac0c>] sock_sendmsg+0x12c/0x160 net/socket.c:684 [<ffffffff83b1f5a2>] __sys_sendto+0x262/0x380 net/socket.c:1996 [<ffffffff83b1f700>] SYSC_sendto net/socket.c:2008 [inline] [<ffffffff83b1f700>] SyS_sendto+0x40/0x60 net/socket.c:2004 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11mfd: bd70528: Fix hour register maskMatti Vaittinen
commit 6c883472e1c11cb05561b6dd0c28bb037c2bf2de upstream. When RTC is used in 24H mode (and it is by this driver) the maximum hour value is 24 in BCD. This occupies bits [5:0] - which means correct mask for HOUR register is 0x3f not 0x1f. Fix the mask Fixes: 32a4a4ebf768 ("rtc: bd70528: Initial support for ROHM bd70528 RTC") Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11regulator: core: Add regulator_is_equal() helperMarek Vasut
commit b059b7e0ec3208ff1e17cff6387d75a9fbab4e02 upstream. Add regulator_is_equal() helper to compare whether two regulators are the same. This is useful for checking whether two separate regulators in a driver are actually the same supply. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Fabio Estevam <festevam@gmail.com> Cc: Igor Opaniuk <igor.opaniuk@toradex.com> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: Marcel Ziswiler <marcel.ziswiler@toradex.com> Cc: Mark Brown <broonie@kernel.org> Cc: Oleksandr Suvorov <oleksandr.suvorov@toradex.com> Link: https://lore.kernel.org/r/20191220164450.1395038-1-marex@denx.de Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11compat: scsi: sg: fix v3 compat read/write interfaceArnd Bergmann
commit 78ed001d9e7106171e0ee761cd854137dd731302 upstream. In the v5.4 merge window, a cleanup patch from Al Viro conflicted with my rework of the compat handling for sg.c read(). Linus Torvalds did a correct merge but pointed out that the resulting code is still unsatisfactory. I later noticed that the sg_new_read() function still gets the compat mode wrong, when the 'count' argument is large enough to pass a compat_sg_io_hdr object, but not a nativ sg_io_hdr. To address both of these, move the definition of compat_sg_io_hdr into a scsi/sg.h to make it visible to sg.c and rewrite the logic for reading req_pack_id as well as the size check to a simpler version that gets the expected results. Fixes: c35a5cfb4150 ("scsi: sg: sg_read(): simplify reading ->pack_id of userland sg_io_hdr_t") Fixes: 98aaaec4a150 ("compat_ioctl: reimplement SG_IO handling") Reviewed-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11percpu: Separate decrypted varaibles anytime encryption can be enabledErdem Aktas
commit 264b0d2bee148073c117e7bbbde5be7125a53be1 upstream. CONFIG_VIRTUALIZATION may not be enabled for memory encrypted guests. If disabled, decrypted per-CPU variables may end up sharing the same page with variables that should be left encrypted. Always separate per-CPU variables that should be decrypted into their own page anytime memory encryption can be enabled in the guest rather than rely on any other config option that may not be enabled. Fixes: ac26963a1175 ("percpu: Introduce DEFINE_PER_CPU_DECRYPTED") Cc: stable@vger.kernel.org # 4.15+ Signed-off-by: Erdem Aktas <erdemaktas@google.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flushPeter Zijlstra
commit 0ed1325967ab5f7a4549a2641c6ebe115f76e228 upstream. Architectures for which we have hardware walkers of Linux page table should flush TLB on mmu gather batch allocation failures and batch flush. Some architectures like POWER supports multiple translation modes (hash and radix) and in the case of POWER only radix translation mode needs the above TLBI. This is because for hash translation mode kernel wants to avoid this extra flush since there are no hardware walkers of linux page table. With radix translation, the hardware also walks linux page table and with that, kernel needs to make sure to TLB invalidate page walk cache before page table pages are freed. More details in commit d86564a2f085 ("mm/tlb, x86/mm: Support invalidating TLB caches for RCU_TABLE_FREE") The changes to sparc are to make sure we keep the old behavior since we are now removing HAVE_RCU_TABLE_NO_INVALIDATE. The default value for tlb_needs_table_invalidate is to always force an invalidate and sparc can avoid the table invalidate. Hence we define tlb_needs_table_invalidate to false for sparc architecture. Link: http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.kumar@linux.ibm.com Fixes: a46cc7a90fd8 ("powerpc/mm/radix: Improve TLB/PWC flushes") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVMSean Christopherson
commit 736c291c9f36b07f8889c61764c28edce20e715d upstream. Convert a plethora of parameters and variables in the MMU and page fault flows from type gva_t to gpa_t to properly handle TDP on 32-bit KVM. Thanks to PSE and PAE paging, 32-bit kernels can access 64-bit physical addresses. When TDP is enabled, the fault address is a guest physical address and thus can be a 64-bit value, even when both KVM and its guest are using 32-bit virtual addressing, e.g. VMX's VMCS.GUEST_PHYSICAL is a 64-bit field, not a natural width field. Using a gva_t for the fault address means KVM will incorrectly drop the upper 32-bits of the GPA. Ditto for gva_to_gpa() when it is used to translate L2 GPAs to L1 GPAs. Opportunistically rename variables and parameters to better reflect the dual address modes, e.g. use "cr2_or_gpa" for fault addresses and plain "addr" instead of "vaddr" when the address may be either a GVA or an L2 GPA. Similarly, use "gpa" in the nonpaging_page_fault() flows to avoid a confusing "gpa_t gva" declaration; this also sets the stage for a future patch to combing nonpaging_page_fault() and tdp_page_fault() with minimal churn. Sprinkle in a few comments to document flows where an address is known to be a GVA and thus can be safely truncated to a 32-bit value. Add WARNs in kvm_handle_page_fault() and FNAME(gva_to_gpa_nested)() to help document such cases and detect bugs. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11x86/kvm: Cache gfn to pfn translationBoris Ostrovsky
commit 917248144db5d7320655dbb41d3af0b8a0f3d589 upstream. __kvm_map_gfn()'s call to gfn_to_pfn_memslot() is * relatively expensive * in certain cases (such as when done from atomic context) cannot be called Stashing gfn-to-pfn mapping should help with both cases. This is part of CVE-2019-3016. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11x86/kvm: Introduce kvm_(un)map_gfn()Boris Ostrovsky
commit 1eff70a9abd46f175defafd29bc17ad456f398a7 upstream. kvm_vcpu_(un)map operates on gfns from any current address space. In certain cases we want to make sure we are not mapping SMRAM and for that we can use kvm_(un)map_gfn() that we are introducing in this patch. This is part of CVE-2019-3016. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11eventfd: track eventfd_signal() recursion depthJens Axboe
commit b5e683d5cab8cd433b06ae178621f083cabd4f63 upstream. eventfd use cases from aio and io_uring can deadlock due to circular or resursive calling, when eventfd_signal() tries to grab the waitqueue lock. On top of that, it's also possible to construct notification chains that are deep enough that we could blow the stack. Add a percpu counter that tracks the percpu recursion depth, warn if we exceed it. The counter is also exposed so that users of eventfd_signal() can do the right thing if it's non-zero in the context where it is called. Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11crypto: pcrypt - Avoid deadlock by using per-instance padata queuesHerbert Xu
commit bbefa1dd6a6d53537c11624752219e39959d04fb upstream. If the pcrypt template is used multiple times in an algorithm, then a deadlock occurs because all pcrypt instances share the same padata_instance, which completes requests in the order submitted. That is, the inner pcrypt request waits for the outer pcrypt request while the outer request is already waiting for the inner. This patch fixes this by allocating a set of queues for each pcrypt instance instead of using two global queues. In order to maintain the existing user-space interface, the pinst structure remains global so any sysfs modifications will apply to every pcrypt instance. Note that when an update occurs we have to allocate memory for every pcrypt instance. Should one of the allocations fail we will abort the update without rolling back changes already made. The new per-instance data structure is called padata_shell and is essentially a wrapper around parallel_data. Reproducer: #include <linux/if_alg.h> #include <sys/socket.h> #include <unistd.h> int main() { struct sockaddr_alg addr = { .salg_type = "aead", .salg_name = "pcrypt(pcrypt(rfc4106-gcm-aesni))" }; int algfd, reqfd; char buf[32] = { 0 }; algfd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(algfd, (void *)&addr, sizeof(addr)); setsockopt(algfd, SOL_ALG, ALG_SET_KEY, buf, 20); reqfd = accept(algfd, 0, 0); write(reqfd, buf, 32); read(reqfd, buf, 16); } Reported-by: syzbot+56c7151cad94eec37c521f0e47d2eee53f9361c4@syzkaller.appspotmail.com Fixes: 5068c7a883d1 ("crypto: pcrypt - Add pcrypt crypto parallelization wrapper") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11cpufreq: Avoid creating excessively large stack framesRafael J. Wysocki
commit 1e4f63aecb53e48468661e922fc2fa3b83e55722 upstream. In the process of modifying a cpufreq policy, the cpufreq core makes a copy of it including all of the internals which is stored on the CPU stack. Because struct cpufreq_policy is relatively large, this may cause the size of the stack frame to exceed the 2 KB limit and so the GCC complains when -Wframe-larger-than= is used. In fact, it is not necessary to copy the entire policy structure in order to modify it, however. First, because cpufreq_set_policy() obtains the min and max policy limits from frequency QoS now, it is not necessary to pass the limits to it from the callers. The only things that need to be passed to it from there are the new governor pointer or (if there is a built-in governor in the driver) the "policy" value representing the governor choice. They both can be passed as individual arguments, though, so make cpufreq_set_policy() take them this way and rework its callers accordingly. This avoids making copies of cpufreq policies in the callers of cpufreq_set_policy(). Second, cpufreq_set_policy() still needs to pass the new policy data to the ->verify() callback of the cpufreq driver whose task is to sanitize the min and max policy limits. It still does not need to make a full copy of struct cpufreq_policy for this purpose, but it needs to pass a few items from it to the driver in case they are needed (different drivers have different needs in that respect and all of them have to be covered). For this reason, introduce struct cpufreq_policy_data to hold copies of the members of struct cpufreq_policy used by the existing ->verify() driver callbacks and pass a pointer to a temporary structure of that type to ->verify() (instead of passing a pointer to full struct cpufreq_policy to it). While at it, notice that intel_pstate and longrun don't really need to verify the "policy" value in struct cpufreq_policy, so drop those check from them to avoid copying "policy" into struct cpufreq_policy_data (which allows it to be slightly smaller). Also while at it fix up white space in a couple of places and make cpufreq_set_policy() static (as it can be so). Fixes: 3000ce3c52f8 ("cpufreq: Use per-policy frequency QoS") Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11ALSA: hda: Apply aligned MMIO access only conditionallyTakashi Iwai
commit 4d024fe8f806e20e577cc934204c5784c7063293 upstream. It turned out that the recent simplification of HD-audio bus access helpers caused a regression on the virtual HD-audio device on QEMU with ARM platforms. The driver got a CORB/RIRB timeout and couldn't probe any codecs. The essential difference that caused a problem was the enforced aligned MMIO accesses by simplification. Since snd-hda-tegra driver is enabled on ARM, it enables CONFIG_SND_HDA_ALIGNED_MMIO, which makes the all HD-audio drivers using the aligned MMIO accesses. While this is mandatory for snd-hda-tegra, it seems that snd-hda-intel on ARM gets broken by this access pattern. For addressing the regression, this patch introduces a new flag, aligned_mmio, to hdac_bus object, and applies the aligned MMIO only when this flag is set. This change affects only platforms with CONFIG_SND_HDA_ALIGNED_MMIO set, i.e. mostly only for ARM platforms. Unfortunately the patch became a big bigger than it should be, just because the former calls didn't take hdac_bus object in the argument, hence we had to extend the call patterns. Fixes: 19abfefd4c76 ("ALSA: hda: Direct MMIO accesses") BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=1161152 Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200120104127.28985-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11media: v4l2-rect.h: fix v4l2_rect_map_inside() top/left adjustmentsHelen Koike
commit f51e50db4c20d46930b33be3f208851265694f3e upstream. boundary->width and boundary->height are sizes relative to boundary->left and boundary->top coordinates, but they were not being taken into consideration to adjust r->left and r->top, leading to the following error: Consider the follow as initial values for boundary and r: struct v4l2_rect boundary = { .left = 100, .top = 100, .width = 800, .height = 600, } struct v4l2_rect r = { .left = 0, .top = 0, .width = 1920, .height = 960, } calling v4l2_rect_map_inside(&r, &boundary) was modifying r to: r = { .left = 0, .top = 0, .width = 800, .height = 600, } Which is wrongly outside the boundary rectangle, because: v4l2_rect_set_max_size(r, boundary); // r->width = 800, r->height = 600 ... if (r->left + r->width > boundary->width) // true r->left = boundary->width - r->width; // r->left = 800 - 800 if (r->top + r->height > boundary->height) // true r->top = boundary->height - r->height; // r->height = 600 - 600 Fix this by considering top/left coordinates from boundary. Fixes: ac49de8c49d7 ("[media] v4l2-rect.h: new header with struct v4l2_rect helper functions") Signed-off-by: Helen Koike <helen.koike@collabora.com> Cc: <stable@vger.kernel.org> # for v4.7 and up Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-11memcg: fix a crash in wb_workfn when a device disappearsTheodore Ts'o
commit 68f23b89067fdf187763e75a56087550624fdbee upstream. Without memcg, there is a one-to-one mapping between the bdi and bdi_writeback structures. In this world, things are fairly straightforward; the first thing bdi_unregister() does is to shutdown the bdi_writeback structure (or wb), and part of that writeback ensures that no other work queued against the wb, and that the wb is fully drained. With memcg, however, there is a one-to-many relationship between the bdi and bdi_writeback structures; that is, there are multiple wb objects which can all point to a single bdi. There is a refcount which prevents the bdi object from being released (and hence, unregistered). So in theory, the bdi_unregister() *should* only get called once its refcount goes to zero (bdi_put will drop the refcount, and when it is zero, release_bdi gets called, which calls bdi_unregister). Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about the Brave New memcg World, and calls bdi_unregister directly. It does this without informing the file system, or the memcg code, or anything else. This causes the root wb associated with the bdi to be unregistered, but none of the memcg-specific wb's are shutdown. So when one of these wb's are woken up to do delayed work, they try to dereference their wb->bdi->dev to fetch the device name, but unfortunately bdi->dev is now NULL, thanks to the bdi_unregister() called by del_gendisk(). As a result, *boom*. Fortunately, it looks like the rest of the writeback path is perfectly happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to create a bdi_dev_name() function which can handle bdi->dev being NULL. This also allows us to bulletproof the writeback tracepoints to prevent them from dereferencing a NULL pointer and crashing the kernel if one is tracing with memcg's enabled, and an iSCSI device dies or a USB storage stick is pulled. The most common way of triggering this will be hotremoval of a device while writeback with memcg enabled is going on. It was triggering several times a day in a heavily loaded production environment. Google Bug Id: 145475544 Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-01udp: segment looped gso packets correctlyWillem de Bruijn
[ Upstream commit 6cd021a58c18a1731f7e47f83e172c0c302d65e5 ] Multicast and broadcast packets can be looped from egress to ingress pre segmentation with dev_loopback_xmit. That function unconditionally sets ip_summed to CHECKSUM_UNNECESSARY. udp_rcv_segment segments gso packets in the udp rx path. Segmentation usually executes on egress, and does not expect packets of this type. __udp_gso_segment interprets !CHECKSUM_PARTIAL as CHECKSUM_NONE. But the offsets are not correct for gso_make_checksum. UDP GSO packets are of type CHECKSUM_PARTIAL, with their uh->check set to the correct pseudo header checksum. Reset ip_summed to this type. (CHECKSUM_PARTIAL is allowed on ingress, see comments in skbuff.h) Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-01net_sched: fix ops->bind_class() implementationsCong Wang
[ Upstream commit 2e24cd755552350b94a7617617c6877b8cbcb701 ] The current implementations of ops->bind_class() are merely searching for classid and updating class in the struct tcf_result, without invoking either of cl_ops->bind_tcf() or cl_ops->unbind_tcf(). This breaks the design of them as qdisc's like cbq use them to count filters too. This is why syzbot triggered the warning in cbq_destroy_class(). In order to fix this, we have to call cl_ops->bind_tcf() and cl_ops->unbind_tcf() like the filter binding path. This patch does so by refactoring out two helper functions __tcf_bind_filter() and __tcf_unbind_filter(), which are lockless and accept a Qdisc pointer, then teaching each implementation to call them correctly. Note, we merely pass the Qdisc pointer as an opaque pointer to each filter, they only need to pass it down to the helper functions without understanding it at all. Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class") Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-01USB: serial: ir-usb: fix link-speed handlingJohan Hovold
commit 17a0184ca17e288decdca8b2841531e34d49285f upstream. Commit e0d795e4f36c ("usb: irda: cleanup on ir-usb module") added a USB IrDA header with common defines, but mistakingly switched to using the class-descriptor baud-rate bitmask values for the outbound header. This broke link-speed handling for rates above 9600 baud, but a device would also be able to operate at the default 9600 baud until a link-speed request was issued (e.g. using the TCGETS ioctl). Fixes: e0d795e4f36c ("usb: irda: cleanup on ir-usb module") Cc: stable <stable@vger.kernel.org> # 2.6.27 Cc: Felipe Balbi <balbi@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-26Merge tag 'block-5.5-2020-01-26' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fix from Jens Axboe: "Unfortunately this weekend we had a few last minute reports, one was for block. The partition disable for zoned devices was overly restrictive, it can work (and be supported) just fine for host-aware variants. Here's a fix ensuring that's the case so we don't break existing users of that" * tag 'block-5.5-2020-01-26' of git://git.kernel.dk/linux-block: block: allow partitions on host aware zone devices
2020-01-26block: allow partitions on host aware zone devicesChristoph Hellwig
Host-aware SMR drives can be used with the commands to explicitly manage zone state, but they can also be used as normal disks. In the former case it makes perfect sense to allow partitions on them, in the latter it does not, just like for host managed devices. Add a check to add_partition to allow partitions on host aware devices, but give up any zone management capabilities in that case, which also catches the previously missed case of adding a partition vs just scanning it. Because sd can rescan the attribute at runtime it needs to check if a disk has partitions, for which a new helper is added to genhd.h. Fixes: 5eac3eb30c9a ("block: Remove partition support for zoned block devices") Reported-by: Borislav Petkov <bp@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Off by one in mt76 airtime calculation, from Dan Carpenter. 2) Fix TLV fragment allocation loop condition in iwlwifi, from Luca Coelho. 3) Don't confirm neigh entries when doing ipsec pmtu updates, from Xu Wang. 4) More checks to make sure we only send TSO packets to lan78xx chips that they can actually handle. From James Hughes. 5) Fix ip_tunnel namespace move, from William Dauchy. 6) Fix unintended packet reordering due to cooperation between listification done by GRO and non-GRO paths. From Maxim Mikityanskiy. 7) Add Jakub Kicincki formally as networking co-maintainer. 8) Info leak in airo ioctls, from Michael Ellerman. 9) IFLA_MTU attribute needs validation during rtnl_create_link(), from Eric Dumazet. 10) Use after free during reload in mlxsw, from Ido Schimmel. 11) Dangling pointers are possible in tp->highest_sack, fix from Eric Dumazet. 12) Missing *pos++ in various networking seq_next handlers, from Vasily Averin. 13) CHELSIO_GET_MEM operation neds CAP_NET_ADMIN check, from Michael Ellerman. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (109 commits) firestream: fix memory leaks net: cxgb3_main: Add CAP_NET_ADMIN check to CHELSIO_GET_MEM net: bcmgenet: Use netif_tx_napi_add() for TX NAPI tipc: change maintainer email address net: stmmac: platform: fix probe for ACPI devices net/mlx5e: kTLS, Do not send decrypted-marked SKBs via non-accel path net/mlx5e: kTLS, Remove redundant posts in TX resync flow net/mlx5e: kTLS, Fix corner-case checks in TX resync flow net/mlx5e: Clear VF config when switching modes net/mlx5: DR, use non preemptible call to get the current cpu number net/mlx5: E-Switch, Prevent ingress rate configuration of uplink rep net/mlx5: DR, Enable counter on non-fwd-dest objects net/mlx5: Update the list of the PCI supported devices net/mlx5: Fix lowest FDB pool size net: Fix skb->csum update in inet_proto_csum_replace16(). netfilter: nf_tables: autoload modules from the abort path netfilter: nf_tables: add __nft_chain_type_get() netfilter: nf_tables_offload: fix check the chain offload flag netfilter: conntrack: sctp: use distinct states for new SCTP connections ipv6_route_seq_next should increase position index ...
2020-01-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Missing netlink attribute sanity check for NFTA_OSF_DREG, from Florian Westphal. 2) Use bitmap infrastructure in ipset to fix KASAN slab-out-of-bounds reads, from Jozsef Kadlecsik. 3) Missing initial CLOSED state in new sctp connection through ctnetlink events, from Jiri Wiesner. 4) Missing check for NFT_CHAIN_HW_OFFLOAD in nf_tables offload indirect block infrastructure, from wenxu. 5) Add __nft_chain_type_get() to sanity check family and chain type. 6) Autoload modules from the nf_tables abort path to fix races reported by syzbot. 7) Remove unnecessary skb->csum update on inet_proto_csum_replace16(), from Praveen Chaudhary. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-24netfilter: nf_tables: autoload modules from the abort pathPablo Neira Ayuso
This patch introduces a list of pending module requests. This new module list is composed of nft_module_request objects that contain the module name and one status field that tells if the module has been already loaded (the 'done' field). In the first pass, from the preparation phase, the netlink command finds that a module is missing on this list. Then, a module request is allocated and added to this list and nft_request_module() returns -EAGAIN. This triggers the abort path with the autoload parameter set on from nfnetlink, request_module() is called and the module request enters the 'done' state. Since the mutex is released when loading modules from the abort phase, the module list is zapped so this is iteration occurs over a local list. Therefore, the request_module() calls happen when object lists are in consistent state (after fulling aborting the transaction) and the commit list is empty. On the second pass, the netlink command will find that it already tried to load the module, so it does not request it again and nft_request_module() returns 0. Then, there is a look up to find the object that the command was missing. If the module was successfully loaded, the command proceeds normally since it finds the missing object in place, otherwise -ENOENT is reported to userspace. This patch also updates nfnetlink to include the reason to enter the abort phase, which is required for this new autoload module rationale. Fixes: ec7470b834fe ("netfilter: nf_tables: store transaction list locally while requesting module") Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-23Merge tag 'xarray-5.5' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds
Pull XArray fixes from Matthew Wilcox: "Primarily bugfixes, mostly around handling index wrap-around correctly. A couple of doc fixes and adding missing APIs. I had an oops live on stage at linux.conf.au this year, and it turned out to be a bug in xas_find() which I can't prove isn't triggerable in the current codebase. Then in looking for the bug, I spotted two more bugs. The bots have had a few days to chew on this with no problems reported, and it passes the test-suite (which now has more tests to make sure these problems don't come back)" * tag 'xarray-5.5' of git://git.infradead.org/users/willy/linux-dax: XArray: Add xa_for_each_range XArray: Fix xas_find returning too many entries XArray: Fix xa_find_after with multi-index entries XArray: Fix infinite loop with entry at ULONG_MAX XArray: Add wrappers for nested spinlocks XArray: Improve documentation of search marks XArray: Fix xas_pause at ULONG_MAX
2020-01-23Merge tag 'trace-v5.5-rc6-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various tracing fixes: - Fix a function comparison warning for a xen trace event macro - Fix a double perf_event linking to a trace_uprobe_filter for multiple events - Fix suspicious RCU warnings in trace event code for using list_for_each_entry_rcu() when the "_rcu" portion wasn't needed. - Fix a bug in the histogram code when using the same variable - Fix a NULL pointer dereference when tracefs lockdown enabled and calling trace_set_default_clock() - A fix to a bug found with the double perf_event linking patch" * tag 'trace-v5.5-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/uprobe: Fix to make trace_uprobe_filter alignment safe tracing: Do not set trace clock if tracefs lockdown is in effect tracing: Fix histogram code when expression has same var as value tracing: trigger: Replace unneeded RCU-list traversals tracing/uprobe: Fix double perf_event linking on multiprobe uprobe tracing: xen: Ordered comparison of function pointers
2020-01-23net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()Eric Dumazet
rtnl_create_link() needs to apply dev->min_mtu and dev->max_mtu checks that we apply in do_setlink() Otherwise malicious users can crash the kernel, for example after an integer overflow : BUG: KASAN: use-after-free in memset include/linux/string.h:365 [inline] BUG: KASAN: use-after-free in __alloc_skb+0x37b/0x5e0 net/core/skbuff.c:238 Write of size 32 at addr ffff88819f20b9c0 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 check_memory_region_inline mm/kasan/generic.c:185 [inline] check_memory_region+0x134/0x1a0 mm/kasan/generic.c:192 memset+0x24/0x40 mm/kasan/common.c:108 memset include/linux/string.h:365 [inline] __alloc_skb+0x37b/0x5e0 net/core/skbuff.c:238 alloc_skb include/linux/skbuff.h:1049 [inline] alloc_skb_with_frags+0x93/0x590 net/core/skbuff.c:5664 sock_alloc_send_pskb+0x7ad/0x920 net/core/sock.c:2242 sock_alloc_send_skb+0x32/0x40 net/core/sock.c:2259 mld_newpack+0x1d7/0x7f0 net/ipv6/mcast.c:1609 add_grhead.isra.0+0x299/0x370 net/ipv6/mcast.c:1713 add_grec+0x7db/0x10b0 net/ipv6/mcast.c:1844 mld_send_cr net/ipv6/mcast.c:1970 [inline] mld_ifc_timer_expire+0x3d3/0x950 net/ipv6/mcast.c:2477 call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0x6c3/0x1790 kernel/time/timer.c:1786 __do_softirq+0x262/0x98c kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0x19b/0x1e0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x1a3/0x610 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61 Code: 98 6b ea f9 eb 8a cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 44 1c 60 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 34 1c 60 00 fb f4 <c3> cc 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 4e 5d 9a f9 e8 79 RSP: 0018:ffffffff89807ce8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff13266ae RBX: ffffffff8987a1c0 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffffffff8987aa54 RBP: ffffffff89807d18 R08: ffffffff8987a1c0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000 R13: ffffffff8a799980 R14: 0000000000000000 R15: 0000000000000000 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:690 default_idle_call+0x84/0xb0 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x3c8/0x6e0 kernel/sched/idle.c:269 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:361 rest_init+0x23b/0x371 init/main.c:451 arch_call_rest_init+0xe/0x1b start_kernel+0x904/0x943 init/main.c:784 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490 x86_64_start_kernel+0x77/0x7b arch/x86/kernel/head64.c:471 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242 The buggy address belongs to the page: page:ffffea00067c82c0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 raw: 057ffe0000000000 ffffea00067c82c8 ffffea00067c82c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88819f20b880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88819f20b900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff88819f20b980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88819f20ba00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88819f20ba80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-22Merge tag 'io_uring-5.5-2020-01-22' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fix from Jens Axboe: "This was supposed to have gone in last week, but due to a brain fart on my part, I forgot that we made this struct addition in the 5.5 cycle. So here it is for 5.5, to prevent having a 32 vs 64-bit compatability issue with the files_update command" * tag 'io_uring-5.5-2020-01-22' of git://git.kernel.dk/linux-block: io_uring: fix compat for IORING_REGISTER_FILES_UPDATE
2020-01-20io_uring: fix compat for IORING_REGISTER_FILES_UPDATEEugene Syromiatnikov
fds field of struct io_uring_files_update is problematic with regards to compat user space, as pointer size is different in 32-bit, 32-on-64-bit, and 64-bit user space. In order to avoid custom handling of compat in the syscall implementation, make fds __u64 and use u64_to_user_ptr in order to retrieve it. Also, align the field naturally and check that no garbage is passed there. Fixes: c3a31e605620c279 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE") Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-20netfilter: ipset: use bitmap infrastructure completelyKadlecsik József
The bitmap allocation did not use full unsigned long sizes when calculating the required size and that was triggered by KASAN as slab-out-of-bounds read in several places. The patch fixes all of them. Reported-by: syzbot+fabca5cbf5e54f3fe2de@syzkaller.appspotmail.com Reported-by: syzbot+827ced406c9a1d9570ed@syzkaller.appspotmail.com Reported-by: syzbot+190d63957b22ef673ea5@syzkaller.appspotmail.com Reported-by: syzbot+dfccdb2bdb4a12ad425e@syzkaller.appspotmail.com Reported-by: syzbot+df0d0f5895ef1f41a65b@syzkaller.appspotmail.com Reported-by: syzbot+b08bd19bb37513357fd4@syzkaller.appspotmail.com Reported-by: syzbot+53cdd0ec0bbabd53370a@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-01-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix non-blocking connect() in x25, from Martin Schiller. 2) Fix spurious decryption errors in kTLS, from Jakub Kicinski. 3) Netfilter use-after-free in mtype_destroy(), from Cong Wang. 4) Limit size of TSO packets properly in lan78xx driver, from Eric Dumazet. 5) r8152 probe needs an endpoint sanity check, from Johan Hovold. 6) Prevent looping in tcp_bpf_unhash() during sockmap/tls free, from John Fastabend. 7) hns3 needs short frames padded on transmit, from Yunsheng Lin. 8) Fix netfilter ICMP header corruption, from Eyal Birger. 9) Fix soft lockup when low on memory in hns3, from Yonglong Liu. 10) Fix NTUPLE firmware command failures in bnxt_en, from Michael Chan. 11) Fix memory leak in act_ctinfo, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits) cxgb4: reject overlapped queues in TC-MQPRIO offload cxgb4: fix Tx multi channel port rate limit net: sched: act_ctinfo: fix memory leak bnxt_en: Do not treat DSN (Digital Serial Number) read failure as fatal. bnxt_en: Fix ipv6 RFS filter matching logic. bnxt_en: Fix NTUPLE firmware command failures. net: systemport: Fixed queue mapping in internal ring map net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec net: dsa: sja1105: Don't error out on disabled ports with no phy-mode net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset net: hns: fix soft lockup when there is not enough memory net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key() net/sched: act_ife: initalize ife->metalist earlier netfilter: nat: fix ICMP header corruption on ICMP errors net: wan: lapbether.c: Use built-in RCU list checking netfilter: nf_tables: fix flowtable list del corruption netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks() netfilter: nf_tables: remove WARN and add NLA_STRING upper limits netfilter: nft_tunnel: ERSPAN_VERSION must not be null netfilter: nft_tunnel: fix null-attribute check ...
2020-01-18Merge tag 'drm-fixes-2020-01-19' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Back from LCA2020, fixes wasn't too busy last week, seems to have quieten down appropriately, some amdgpu, i915, then a core mst fix and one fix for virtio-gpu and one for rockchip: core mst: - serialize down messages and clear timeslots are on unplug amdgpu: - Update golden settings for renoir - eDP fix i915: - uAPI fix: Remove dash and colon from PMU names to comply with tools/perf - Fix for include file that was indirectly included - Two fixes to make sure VMA are marked active for error capture virtio: - maintain obj reservation lock when submitting cmds rockchip: - increase link rate var size to accommodate rates" * tag 'drm-fixes-2020-01-19' of git://anongit.freedesktop.org/drm/drm: drm/amd/display: Reorder detect_edp_sink_caps before link settings read. drm/amdgpu: update goldensetting for renoir drm/dp_mst: Have DP_Tx send one msg at a time drm/dp_mst: clear time slots for ports invalid drm/i915/pmu: Do not use colons or dashes in PMU names drm/rockchip: fix integer type used for storing dp data rate drm/i915/gt: Mark ring->vma as active while pinned drm/i915/gt: Mark context->state vma as active while pinned drm/i915/gt: Skip trying to unbind in restore_ggtt_mappings drm/i915: Add missing include file <linux/math64.h> drm/virtio: add missing virtio_gpu_array_lock_resv call
2020-01-18Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rseq fixes from Ingo Molnar: "Two rseq bugfixes: - CLONE_VM !CLONE_THREAD didn't work properly, the kernel would end up corrupting the TLS of the parent. Technically a change in the ABI but the previous behavior couldn't resonably have been relied on by applications so this looks like a valid exception to the ABI rule. - Make the RSEQ_FLAG_UNREGISTER ABI behavior consistent with the handling of other flags. This is not thought to impact any applications either" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: Unregister rseq for clone CLONE_VM rseq: Reject unknown flags on rseq unregister
2020-01-17XArray: Add xa_for_each_rangeMatthew Wilcox (Oracle)
This function supports iterating over a range of an array. Also add documentation links for xa_for_each_start(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-01-17XArray: Add wrappers for nested spinlocksMatthew Wilcox (Oracle)
Some users need to take an xarray lock while holding another xarray lock. Reported-by: Doug Gilbert <dgilbert@interlog.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-01-18Merge tag 'drm-misc-fixes-2020-01-16' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes virtio: maintain obj reservation lock when submitting cmds (Gerd) rockchip: increase link rate var size to accommodate rates (Tobias) mst: serialize down messages and clear timeslots are on unplug (Wayne) Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Tobias Schramm <t.schramm@manjaro.org> Cc: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> From: Sean Paul <sean@poorly.run> Link: https://patchwork.freedesktop.org/patch/msgid/20200116162856.GA11524@art_vandelay
2020-01-17Merge tag 'block-5.5-2020-01-16' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Three fixes that should go into this release: - The 32-bit segment size fix that I mentioned last week (Ming) - Use uint for the block size (Mikulas) - A null_blk zone write handling fix (Damien)" * tag 'block-5.5-2020-01-16' of git://git.kernel.dk/linux-block: block: fix an integer overflow in logical block size null_blk: Fix zone write handling block: fix get_max_segment_size() overflow on 32bit arch
2020-01-16Merge tag 'armsoc-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Olof Johansson: "I've been sitting on these longer than I meant, so the patch count is a bit higher than ideal for this part of the release. There's also some reverts of double-applied patches that brings the diffstat up a bit. With that said, the biggest changes are: - Revert of duplicate i2c device addition on two Aspeed (BMC) Devicetrees. - Move of two device nodes that got applied to the wrong part of the tree on ASpeed G6. - Regulator fix for Beaglebone X15 (adding 12/5V supplies) - Use interrupts for keys on Amlogic SM1 to avoid missed polls In addition to that, there is a collection of smaller DT fixes: - Power supply assignment fixes for i.MX6 - Fix of interrupt line for magnetometer on i.MX8 Librem5 devkit - Build fixlets (selects) for davinci/omap2+ - More interrupt number fixes for Stratix10, Amlogic SM1, etc. - ... and more similar fixes across different platforms And some non-DT stuff: - optee fix to register multiple shared pages properly - Clock calculation fixes for MMP3 - Clock fixes for OMAP as well" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits) MAINTAINERS: Add myself as the co-maintainer for Actions Semi platforms ARM: dts: imx7: Fix Toradex Colibri iMX7S 256MB NAND flash support ARM: dts: imx6sll-evk: Remove incorrect power supply assignment ARM: dts: imx6sl-evk: Remove incorrect power supply assignment ARM: dts: imx6sx-sdb: Remove incorrect power supply assignment ARM: dts: imx6qdl-sabresd: Remove incorrect power supply assignment ARM: dts: imx6q-icore-mipi: Use 1.5 version of i.Core MX6DL ARM: omap2plus: select RESET_CONTROLLER ARM: davinci: select CONFIG_RESET_CONTROLLER ARM: dts: aspeed: rainier: Fix fan fault and presence ARM: dts: aspeed: rainier: Remove duplicate i2c busses ARM: dts: aspeed: tacoma: Remove duplicate flash nodes ARM: dts: aspeed: tacoma: Remove duplicate i2c busses ARM: dts: aspeed: tacoma: Fix fsi master node ARM: dts: aspeed-g6: Fix FSI master location ARM: dts: mmp3: Fix the TWSI ranges clk: mmp2: Fix the order of timer mux parents ARM: mmp: do not divide the clock rate arm64: dts: rockchip: Fix IR on Beelink A1 optee: Fix multi page dynamic shm pool alloc ...
2020-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2020-01-15 The following pull-request contains BPF updates for your *net* tree. We've added 12 non-merge commits during the last 9 day(s) which contain a total of 13 files changed, 95 insertions(+), 43 deletions(-). The main changes are: 1) Fix refcount leak for TCP time wait and request sockets for socket lookup related BPF helpers, from Lorenz Bauer. 2) Fix wrong verification of ARSH instruction under ALU32, from Daniel Borkmann. 3) Batch of several sockmap and related TLS fixes found while operating more complex BPF programs with Cilium and OpenSSL, from John Fastabend. 4) Fix sockmap to read psock's ingress_msg queue before regular sk_receive_queue() to avoid purging data upon teardown, from Lingpeng Chen. 5) Fix printing incorrect pointer in bpftool's btf_dump_ptr() in order to properly dump a BPF map's value with BTF, from Martin KaFai Lau. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-15block: fix an integer overflow in logical block sizeMikulas Patocka
Logical block size has type unsigned short. That means that it can be at most 32768. However, there are architectures that can run with 64k pages (for example arm64) and on these architectures, it may be possible to create block devices with 64k block size. For exmaple (run this on an architecture with 64k pages): Mount will fail with this error because it tries to read the superblock using 2-sector access: device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536 EXT4-fs (dm-0): unable to read superblock This patch changes the logical block size from unsigned short to unsigned int to avoid the overflow. Cc: stable@vger.kernel.org Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-15bpf: Sockmap/tls, push write_space updates through ulp updatesJohn Fastabend
When sockmap sock with TLS enabled is removed we cleanup bpf/psock state and call tcp_update_ulp() to push updates to TLS ULP on top. However, we don't push the write_space callback up and instead simply overwrite the op with the psock stored previous op. This may or may not be correct so to ensure we don't overwrite the TLS write space hook pass this field to the ULP and have it fixup the ctx. This completes a previous fix that pushed the ops through to the ULP but at the time missed doing this for write_space, presumably because write_space TLS hook was added around the same time. Fixes: 95fa145479fbc ("bpf: sockmap/tls, close can race with map free") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com
2020-01-15bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loopJohn Fastabend
When a sockmap is free'd and a socket in the map is enabled with tls we tear down the bpf context on the socket, the psock struct and state, and then call tcp_update_ulp(). The tcp_update_ulp() call is to inform the tls stack it needs to update its saved sock ops so that when the tls socket is later destroyed it doesn't try to call the now destroyed psock hooks. This is about keeping stacked ULPs in good shape so they always have the right set of stacked ops. However, recently unhash() hook was removed from TLS side. But, the sockmap/bpf side is not doing any extra work to update the unhash op when is torn down instead expecting TLS side to manage it. So both TLS and sockmap believe the other side is managing the op and instead no one updates the hook so it continues to point at tcp_bpf_unhash(). When unhash hook is called we call tcp_bpf_unhash() which detects the psock has already been destroyed and calls sk->sk_prot_unhash() which calls tcp_bpf_unhash() yet again and so on looping and hanging the core. To fix have sockmap tear down logic fixup the stale pointer. Fixes: 5d92e631b8be ("net/tls: partially revert fix transition through disconnect with close") Reported-by: syzbot+83979935eb6304f8cd46@syzkaller.appspotmail.com Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/bpf/20200111061206.8028-2-john.fastabend@gmail.com
2020-01-15drm/dp_mst: Have DP_Tx send one msg at a timeWayne Lin
[Why] Noticed this while testing MST with the 4 ports MST hub from StarTech.com. Sometimes can't light up monitors normally and get the error message as 'sideband msg build failed'. Look into aux transactions, found out that source sometimes will send out another down request before receiving the down reply of the previous down request. On the other hand, in drm_dp_get_one_sb_msg(), current code doesn't handle the interleaved replies case. Hence, source can't build up message completely and can't light up monitors. [How] For good compatibility, enforce source to send out one down request at a time. Add a flag, is_waiting_for_dwn_reply, to determine if the source can send out a down request immediately or not. - Check the flag before calling process_single_down_tx_qlock to send out a msg - Set the flag when successfully send out a down request - Clear the flag when successfully build up a down reply - Clear the flag when find erros during sending out a down request - Clear the flag when find errors during building up a down reply - Clear the flag when timeout occurs during waiting for a down reply - Use drm_dp_mst_kick_tx() to try to send another down request in queue at the end of drm_dp_mst_wait_tx_reply() (attempt to send out messages in queue when errors occur) Cc: Lyude Paul <lyude@redhat.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200113093649.11755-1-Wayne.Lin@amd.com
2020-01-15bpf: Fix incorrect verifier simulation of ARSH under ALU32Daniel Borkmann
Anatoly has been fuzzing with kBdysch harness and reported a hang in one of the outcomes: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (85) call bpf_get_socket_cookie#46 1: R0_w=invP(id=0) R10=fp0 1: (57) r0 &= 808464432 2: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0 2: (14) w0 -= 810299440 3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0 3: (c4) w0 s>>= 1 4: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0 4: (76) if w0 s>= 0x30303030 goto pc+216 221: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0 221: (95) exit processed 6 insns (limit 1000000) [...] Taking a closer look, the program was xlated as follows: # ./bpftool p d x i 12 0: (85) call bpf_get_socket_cookie#7800896 1: (bf) r6 = r0 2: (57) r6 &= 808464432 3: (14) w6 -= 810299440 4: (c4) w6 s>>= 1 5: (76) if w6 s>= 0x30303030 goto pc+216 6: (05) goto pc-1 7: (05) goto pc-1 8: (05) goto pc-1 [...] 220: (05) goto pc-1 221: (05) goto pc-1 222: (95) exit Meaning, the visible effect is very similar to f54c7898ed1c ("bpf: Fix precision tracking for unbounded scalars"), that is, the fall-through branch in the instruction 5 is considered to be never taken given the conclusion from the min/max bounds tracking in w6, and therefore the dead-code sanitation rewrites it as goto pc-1. However, real-life input disagrees with verification analysis since a soft-lockup was observed. The bug sits in the analysis of the ARSH. The definition is that we shift the target register value right by K bits through shifting in copies of its sign bit. In adjust_scalar_min_max_vals(), we do first coerce the register into 32 bit mode, same happens after simulating the operation. However, for the case of simulating the actual ARSH, we don't take the mode into account and act as if it's always 64 bit, but location of sign bit is different: dst_reg->smin_value >>= umin_val; dst_reg->smax_value >>= umin_val; dst_reg->var_off = tnum_arshift(dst_reg->var_off, umin_val); Consider an unknown R0 where bpf_get_socket_cookie() (or others) would for example return 0xffff. With the above ARSH simulation, we'd see the following results: [...] 1: R1=ctx(id=0,off=0,imm=0) R2_w=invP65535 R10=fp0 1: (85) call bpf_get_socket_cookie#46 2: R0_w=invP(id=0) R10=fp0 2: (57) r0 &= 808464432 -> R0_runtime = 0x3030 3: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0 3: (14) w0 -= 810299440 -> R0_runtime = 0xcfb40000 4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0 (0xffffffff) 4: (c4) w0 s>>= 1 -> R0_runtime = 0xe7da0000 5: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0 (0x67c00000) (0x7ffbfff8) [...] In insn 3, we have a runtime value of 0xcfb40000, which is '1100 1111 1011 0100 0000 0000 0000 0000', the result after the shift has 0xe7da0000 that is '1110 0111 1101 1010 0000 0000 0000 0000', where the sign bit is correctly retained in 32 bit mode. In insn4, the umax was 0xffffffff, and changed into 0x7ffbfff8 after the shift, that is, '0111 1111 1111 1011 1111 1111 1111 1000' and means here that the simulation didn't retain the sign bit. With above logic, the updates happen on the 64 bit min/max bounds and given we coerced the register, the sign bits of the bounds are cleared as well, meaning, we need to force the simulation into s32 space for 32 bit alu mode. Verification after the fix below. We're first analyzing the fall-through branch on 32 bit signed >= test eventually leading to rejection of the program in this specific case: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r2 = 808464432 1: R1=ctx(id=0,off=0,imm=0) R2_w=invP808464432 R10=fp0 1: (85) call bpf_get_socket_cookie#46 2: R0_w=invP(id=0) R10=fp0 2: (bf) r6 = r0 3: R0_w=invP(id=0) R6_w=invP(id=0) R10=fp0 3: (57) r6 &= 808464432 4: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0 4: (14) w6 -= 810299440 5: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0 5: (c4) w6 s>>= 1 6: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0 (0x67c00000) (0xfffbfff8) 6: (76) if w6 s>= 0x30303030 goto pc+216 7: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0 7: (30) r0 = *(u8 *)skb[808464432] BPF_LD_[ABS|IND] uses reserved fields processed 8 insns (limit 1000000) [...] Fixes: 9cbe1f5a32dc ("bpf/verifier: improve register value range tracking with ARSH") Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115204733.16648-1-daniel@iogearbox.net