summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2020-05-27l2tp: device MTU setup, tunnel socket needs a lockR. Parameswaran
commit 57240d007816486131bee88cd474c2a71f0fe224 upstream. The MTU overhead calculation in L2TP device set-up merged via commit b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff needs to be adjusted to lock the tunnel socket while referencing the sub-data structures to derive the socket's IP overhead. Reported-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: R. Parameswaran <rparames@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Giuliano Procida <gprocida@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-27New kernel function to get IP overhead on a socket.R. Parameswaran
commit 113c3075931a334f899008f6c753abe70a3a9323 upstream. A new function, kernel_sock_ip_overhead(), is provided to calculate the cumulative overhead imposed by the IP Header and IP options, if any, on a socket's payload. The new function returns an overhead of zero for sockets that do not belong to the IPv4 or IPv6 address families. This is used in the L2TP code path to compute the total outer IP overhead on the L2TP tunnel socket when calculating the default MTU for Ethernet pseudowires. Signed-off-by: R. Parameswaran <rparames@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Giuliano Procida <gprocida@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-27padata: Replace delayed timer with immediate workqueue in padata_reorderHerbert Xu
[ Upstream commit 6fc4dbcf0276279d488c5fbbfabe94734134f4fa ] The function padata_reorder will use a timer when it cannot progress while completed jobs are outstanding (pd->reorder_objects > 0). This is suboptimal as if we do end up using the timer then it would have introduced a gratuitous delay of one second. In fact we can easily distinguish between whether completed jobs are outstanding and whether we can make progress. All we have to do is look at the next pqueue list. This patch does that by replacing pd->processed with pd->cpu so that the next pqueue is more accessible. A work queue is used instead of the original try_again to avoid hogging the CPU. Note that we don't bother removing the work queue in padata_flush_queues because the whole premise is broken. You cannot flush async crypto requests so it makes no sense to even try. A subsequent patch will fix it by replacing it with a ref counting scheme. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> [dj: - adjust context - corrected setup_timer -> timer_setup to delete hunk - skip padata_flush_queues() hunk, function already removed in 4.9] Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-27padata: ensure padata_do_serial() runs on the correct CPUMathias Krause
commit 350ef88e7e922354f82a931897ad4a4ce6c686ff upstream. If the algorithm we're parallelizing is asynchronous we might change CPUs between padata_do_parallel() and padata_do_serial(). However, we don't expect this to happen as we need to enqueue the padata object into the per-cpu reorder queue we took it from, i.e. the same-cpu's parallel queue. Ensure we're not switching CPUs for a given padata object by tracking the CPU within the padata object. If the serial callback gets called on the wrong CPU, defer invoking padata_reorder() via a kernel worker on the CPU we're expected to run on. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-27padata: ensure the reorder timer callback runs on the correct CPUMathias Krause
commit cf5868c8a22dc2854b96e9569064bb92365549ca upstream. The reorder timer function runs on the CPU where the timer interrupt was handled which is not necessarily one of the CPUs of the 'pcpu' CPU mask set. Ensure the padata_reorder() callback runs on the correct CPU, which is one in the 'pcpu' CPU mask set and, preferrably, the next expected one. Do so by comparing the current CPU with the expected target CPU. If they match, call padata_reorder() right away. If they differ, schedule a work item on the target CPU that does the padata_reorder() call for us. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20x86: Fix early boot crash on gcc-10, third tryBorislav Petkov
commit a9a3ed1eff3601b63aea4fb462d8b3b92c7c1e7e upstream. ... or the odyssey of trying to disable the stack protector for the function which generates the stack canary value. The whole story started with Sergei reporting a boot crash with a kernel built with gcc-10: Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b37df9 #139 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013 Call Trace: dump_stack panic ? start_secondary __stack_chk_fail start_secondary secondary_startup_64 -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary This happens because gcc-10 tail-call optimizes the last function call in start_secondary() - cpu_startup_entry() - and thus emits a stack canary check which fails because the canary value changes after the boot_init_stack_canary() call. To fix that, the initial attempt was to mark the one function which generates the stack canary with: __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused) however, using the optimize attribute doesn't work cumulatively as the attribute does not add to but rather replaces previously supplied optimization options - roughly all -fxxx options. The key one among them being -fno-omit-frame-pointer and thus leading to not present frame pointer - frame pointer which the kernel needs. The next attempt to prevent compilers from tail-call optimizing the last function call cpu_startup_entry(), shy of carving out start_secondary() into a separate compilation unit and building it with -fno-stack-protector, was to add an empty asm(""). This current solution was short and sweet, and reportedly, is supported by both compilers but we didn't get very far this time: future (LTO?) optimization passes could potentially eliminate this, which leads us to the third attempt: having an actual memory barrier there which the compiler cannot ignore or move around etc. That should hold for a long time, but hey we said that about the other two solutions too so... Reported-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kalle Valo <kvalo@codeaurora.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20gcc-10 warnings: fix low-hanging fruitLinus Torvalds
commit 9d82973e032e246ff5663c9805fbb5407ae932e3 upstream. Due to a bug-report that was compiler-dependent, I updated one of my machines to gcc-10. That shows a lot of new warnings. Happily they seem to be mostly the valid kind, but it's going to cause a round of churn for getting rid of them.. This is the really low-hanging fruit of removing a couple of zero-sized arrays in some core code. We have had a round of these patches before, and we'll have many more coming, and there is nothing special about these except that they were particularly trivial, and triggered more warnings than most. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20pnp: Use list_for_each_entry() instead of open codingJason Gunthorpe
commit 01b2bafe57b19d9119413f138765ef57990921ce upstream. Aside from good practice, this avoids a warning from gcc 10: ./include/linux/kernel.h:997:3: warning: array subscript -31 is outside array bounds of ‘struct list_head[1]’ [-Warray-bounds] 997 | ((type *)(__mptr - offsetof(type, member))); }) | ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/list.h:493:2: note: in expansion of macro ‘container_of’ 493 | container_of(ptr, type, member) | ^~~~~~~~~~~~ ./include/linux/pnp.h:275:30: note: in expansion of macro ‘list_entry’ 275 | #define global_to_pnp_dev(n) list_entry(n, struct pnp_dev, global_list) | ^~~~~~~~~~ ./include/linux/pnp.h:281:11: note: in expansion of macro ‘global_to_pnp_dev’ 281 | (dev) != global_to_pnp_dev(&pnp_global); \ | ^~~~~~~~~~~~~~~~~ arch/x86/kernel/rtc.c:189:2: note: in expansion of macro ‘pnp_for_each_dev’ 189 | pnp_for_each_dev(dev) { Because the common code doesn't cast the starting list_head to the containing struct. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> [ rjw: Whitespace adjustments ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-20ptp: fix the race between the release of ptp_clock and cdevVladis Dronov
commit a33121e5487b424339636b25c35d3a180eaa5f5e upstream. In a case when a ptp chardev (like /dev/ptp0) is open but an underlying device is removed, closing this file leads to a race. This reproduces easily in a kvm virtual machine: ts# cat openptp0.c int main() { ... fp = fopen("/dev/ptp0", "r"); ... sleep(10); } ts# uname -r 5.5.0-rc3-46cf053e ts# cat /proc/cmdline ... slub_debug=FZP ts# modprobe ptp_kvm ts# ./openptp0 & [1] 670 opened /dev/ptp0, sleeping 10s... ts# rmmod ptp_kvm ts# ls /dev/ptp* ls: cannot access '/dev/ptp*': No such file or directory ts# ...woken up [ 48.010809] general protection fault: 0000 [#1] SMP [ 48.012502] CPU: 6 PID: 658 Comm: openptp0 Not tainted 5.5.0-rc3-46cf053e #25 [ 48.014624] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... [ 48.016270] RIP: 0010:module_put.part.0+0x7/0x80 [ 48.017939] RSP: 0018:ffffb3850073be00 EFLAGS: 00010202 [ 48.018339] RAX: 000000006b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: ffff89a476c00ad0 [ 48.018936] RDX: fffff65a08d3ea08 RSI: 0000000000000247 RDI: 6b6b6b6b6b6b6b6b [ 48.019470] ... ^^^ a slub poison [ 48.023854] Call Trace: [ 48.024050] __fput+0x21f/0x240 [ 48.024288] task_work_run+0x79/0x90 [ 48.024555] do_exit+0x2af/0xab0 [ 48.024799] ? vfs_write+0x16a/0x190 [ 48.025082] do_group_exit+0x35/0x90 [ 48.025387] __x64_sys_exit_group+0xf/0x10 [ 48.025737] do_syscall_64+0x3d/0x130 [ 48.026056] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 48.026479] RIP: 0033:0x7f53b12082f6 [ 48.026792] ... [ 48.030945] Modules linked in: ptp i6300esb watchdog [last unloaded: ptp_kvm] [ 48.045001] Fixing recursive fault but reboot is needed! This happens in: static void __fput(struct file *file) { ... if (file->f_op->release) file->f_op->release(inode, file); <<< cdev is kfree'd here if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL && !(mode & FMODE_PATH))) { cdev_put(inode->i_cdev); <<< cdev fields are accessed here Namely: __fput() posix_clock_release() kref_put(&clk->kref, delete_clock) <<< the last reference delete_clock() delete_ptp_clock() kfree(ptp) <<< cdev is embedded in ptp cdev_put module_put(p->owner) <<< *p is kfree'd, bang! Here cdev is embedded in posix_clock which is embedded in ptp_clock. The race happens because ptp_clock's lifetime is controlled by two refcounts: kref and cdev.kobj in posix_clock. This is wrong. Make ptp_clock's sysfs device a parent of cdev with cdev_device_add() created especially for such cases. This way the parent device with its ptp_clock is not released until all references to the cdev are released. This adds a requirement that an initialized but not exposed struct device should be provided to posix_clock_register() by a caller instead of a simple dev_t. This approach was adopted from the commit 72139dfa2464 ("watchdog: Fix the race between the release of watchdog_core_data and cdev"). See details of the implementation in the commit 233ed09d7fda ("chardev: add helper function to register char devs with a struct device"). Link: https://lore.kernel.org/linux-fsdevel/20191125125342.6189-1-vdronov@redhat.com/T/#u Analyzed-by: Stephen Johnston <sjohnsto@redhat.com> Analyzed-by: Vern Lovejoy <vlovejoy@redhat.com> Signed-off-by: Vladis Dronov <vdronov@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20chardev: add helper function to register char devs with a struct deviceLogan Gunthorpe
commit 233ed09d7fdacf592ee91e6c97ce5f4364fbe7c0 upstream. Credit for this patch goes is shared with Dan Williams [1]. I've taken things one step further to make the helper function more useful and clean up calling code. There's a common pattern in the kernel whereby a struct cdev is placed in a structure along side a struct device which manages the life-cycle of both. In the naive approach, the reference counting is broken and the struct device can free everything before the chardev code is entirely released. Many developers have solved this problem by linking the internal kobjs in this fashion: cdev.kobj.parent = &parent_dev.kobj; The cdev code explicitly gets and puts a reference to it's kobj parent. So this seems like it was intended to be used this way. Dmitrty Torokhov first put this in place in 2012 with this commit: 2f0157f char_dev: pin parent kobject and the first instance of the fix was then done in the input subsystem in the following commit: 4a215aa Input: fix use-after-free introduced with dynamic minor changes Subsequently over the years, however, this issue seems to have tripped up multiple developers independently. For example, see these commits: 0d5b7da iio: Prevent race between IIO chardev opening and IIO device (by Lars-Peter Clausen in 2013) ba0ef85 tpm: Fix initialization of the cdev (by Jason Gunthorpe in 2015) 5b28dde [media] media: fix use-after-free in cdev_put() when app exits after driver unbind (by Shauh Khan in 2016) This technique is similarly done in at least 15 places within the kernel and probably should have been done so in another, at least, 5 places. The kobj line also looks very suspect in that one would not expect drivers to have to mess with kobject internals in this way. Even highly experienced kernel developers can be surprised by this code, as seen in [2]. To help alleviate this situation, and hopefully prevent future wasted effort on this problem, this patch introduces a helper function to register a char device along with its parent struct device. This creates a more regular API for tying a char device to its parent without the developer having to set members in the underlying kobject. This patch introduce cdev_device_add and cdev_device_del which replaces a common pattern including setting the kobj parent, calling cdev_add and then calling device_add. It also introduces cdev_set_parent for the few cases that set the kobject parent without using device_add. [1] https://lkml.org/lkml/2017/2/13/700 [2] https://lkml.org/lkml/2017/2/10/370 Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Hans Verkuil <hans.verkuil@cisco.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20blktrace: Protect q->blk_trace with RCUJan Kara
commit c780e86dd48ef6467a1146cf7d0fe1e05a635039 upstream. KASAN is reporting that __blk_add_trace() has a use-after-free issue when accessing q->blk_trace. Indeed the switching of block tracing (and thus eventual freeing of q->blk_trace) is completely unsynchronized with the currently running tracing and thus it can happen that the blk_trace structure is being freed just while __blk_add_trace() works on it. Protect accesses to q->blk_trace by RCU during tracing and make sure we wait for the end of RCU grace period when shutting down tracing. Luckily that is rare enough event that we can afford that. Note that postponing the freeing of blk_trace to an RCU callback should better be avoided as it could have unexpected user visible side-effects as debugfs files would be still existing for a short while block tracing has been shut down. Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711 CC: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reported-by: Tristan Madani <tristmd@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk> [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-20blktrace: Fix potential deadlock between delete & sysfs opsWaiman Long
commit 5acb3cc2c2e9d3020a4fee43763c6463767f1572 upstream. The lockdep code had reported the following unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#228); lock(&bdev->bd_mutex/1); lock(s_active#228); lock(&bdev->bd_mutex); *** DEADLOCK *** The deadlock may happen when one task (CPU1) is trying to delete a partition in a block device and another task (CPU0) is accessing tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that partition. The s_active isn't an actual lock. It is a reference count (kn->count) on the sysfs (kernfs) file. Removal of a sysfs file, however, require a wait until all the references are gone. The reference count is treated like a rwsem using lockdep instrumentation code. The fact that a thread is in the sysfs callback method or in the ioctl call means there is a reference to the opended sysfs or device file. That should prevent the underlying block structure from being removed. Instead of using bd_mutex in the block_device structure, a new blk_trace_mutex is now added to the request_queue structure to protect access to the blk_trace structure. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Fix typo in patch subject line, and prune a comment detailing how the code used to work. Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-05-10mac80211: add ieee80211_is_any_nullfunc()Thomas Pedersen
commit 30b2f0be23fb40e58d0ad2caf8702c2a44cda2e1 upstream. commit 08a5bdde3812 ("mac80211: consider QoS Null frames for STA_NULLFUNC_ACKED") Fixed a bug where we failed to take into account a nullfunc frame can be either non-QoS or QoS. It turns out there is at least one more bug in ieee80211_sta_tx_notify(), introduced in commit 7b6ddeaf27ec ("mac80211: use QoS NDP for AP probing"), where we forgot to check for the QoS variant and so assumed the QoS nullfunc frame never went out Fix this by adding a helper ieee80211_is_any_nullfunc() which consolidates the check for non-QoS and QoS nullfunc frames. Replace existing compound conditionals and add a couple more missing checks for QoS variant. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20200114055940.18502-3-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02KVM: Check validity of resolved slot when searching memslotsSean Christopherson
commit b6467ab142b708dd076f6186ca274f14af379c72 upstream. Check that the resolved slot (somewhat confusingly named 'start') is a valid/allocated slot before doing the final comparison to see if the specified gfn resides in the associated slot. The resolved slot can be invalid if the binary search loop terminated because the search index was incremented beyond the number of used slots. This bug has existed since the binary search algorithm was introduced, but went unnoticed because KVM statically allocated memory for the max number of slots, i.e. the access would only be truly out-of-bounds if all possible slots were allocated and the specified gfn was less than the base of the lowest memslot. Commit 36947254e5f98 ("KVM: Dynamically size memslot array based on number of used slots") eliminated the "all possible slots allocated" condition and made the bug embarrasingly easy to hit. Fixes: 9c1a5d38780e6 ("kvm: optimize GFN to memslot lookup with large slots amount") Reported-by: syzbot+d889b59b2bb87d4047a2@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200408064059.8957-2-sean.j.christopherson@intel.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02vmalloc: fix remap_vmalloc_range() bounds checksJann Horn
commit bdebd6a2831b6fab69eb85cee74a8ba77f1a1cc2 upstream. remap_vmalloc_range() has had various issues with the bounds checks it promises to perform ("This function checks that addr is a valid vmalloc'ed area, and that it is big enough to cover the vma") over time, e.g.: - not detecting pgoff<<PAGE_SHIFT overflow - not detecting (pgoff<<PAGE_SHIFT)+usize overflow - not checking whether addr and addr+(pgoff<<PAGE_SHIFT) are the same vmalloc allocation - comparing a potentially wildly out-of-bounds pointer with the end of the vmalloc region In particular, since commit fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY"), unprivileged users can cause kernel null pointer dereferences by calling mmap() on a BPF map with a size that is bigger than the distance from the start of the BPF map to the end of the address space. This could theoretically be used as a kernel ASLR bypass, by using whether mmap() with a given offset oopses or returns an error code to perform a binary search over the possible address range. To allow remap_vmalloc_range_partial() to verify that addr and addr+(pgoff<<PAGE_SHIFT) are in the same vmalloc region, pass the offset to remap_vmalloc_range_partial() instead of adding it to the pointer in remap_vmalloc_range(). In remap_vmalloc_range_partial(), fix the check against get_vm_area_size() by using size comparisons instead of pointer comparisons, and add checks for pgoff. Fixes: 833423143c3a ("[PATCH] mm: introduce remap_vmalloc_range()") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: Andrii Nakryiko <andriin@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@chromium.org> Link: http://lkml.kernel.org/r/20200415222312.236431-1-jannh@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-05-02overflow.h: Add arithmetic shift helperJason Gunthorpe
commit 0c66847793d1982d1083dc6f7adad60fa265ce9c upstream. Add shift_overflow() helper to assist driver authors in ensuring that shift operations don't cause overflows or other odd conditions. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> [kees: tweaked comments and commit log, dropped unneeded assignment] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24compiler.h: fix error in BUILD_BUG_ON() reportingVegard Nossum
[ Upstream commit af9c5d2e3b355854ff0e4acfbfbfadcd5198a349 ] compiletime_assert() uses __LINE__ to create a unique function name. This means that if you have more than one BUILD_BUG_ON() in the same source line (which can happen if they appear e.g. in a macro), then the error message from the compiler might output the wrong condition. For this source file: #include <linux/build_bug.h> #define macro() \ BUILD_BUG_ON(1); \ BUILD_BUG_ON(0); void foo() { macro(); } gcc would output: ./include/linux/compiler.h:350:38: error: call to `__compiletime_assert_9' declared with attribute error: BUILD_BUG_ON failed: 0 _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__) However, it was not the BUILD_BUG_ON(0) that failed, so it should say 1 instead of 0. With this patch, we use __COUNTER__ instead of __LINE__, so each BUILD_BUG_ON() gets a different function name and the correct condition is printed: ./include/linux/compiler.h:350:38: error: call to `__compiletime_assert_0' declared with attribute error: BUILD_BUG_ON failed: 1 _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Daniel Santos <daniel.santos@pobox.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Ian Abbott <abbotti@mev.co.uk> Cc: Joe Perches <joe@perches.com> Link: http://lkml.kernel.org/r/20200331112637.25047-1-vegard.nossum@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-24percpu_counter: fix a data race at vm_committed_asQian Cai
[ Upstream commit 7e2345200262e4a6056580f0231cccdaffc825f3 ] "vm_committed_as.count" could be accessed concurrently as reported by KCSAN, BUG: KCSAN: data-race in __vm_enough_memory / percpu_counter_add_batch write to 0xffffffff9451c538 of 8 bytes by task 65879 on cpu 35: percpu_counter_add_batch+0x83/0xd0 percpu_counter_add_batch at lib/percpu_counter.c:91 __vm_enough_memory+0xb9/0x260 dup_mm+0x3a4/0x8f0 copy_process+0x2458/0x3240 _do_fork+0xaa/0x9f0 __do_sys_clone+0x125/0x160 __x64_sys_clone+0x70/0x90 do_syscall_64+0x91/0xb05 entry_SYSCALL_64_after_hwframe+0x49/0xbe read to 0xffffffff9451c538 of 8 bytes by task 66773 on cpu 19: __vm_enough_memory+0x199/0x260 percpu_counter_read_positive at include/linux/percpu_counter.h:81 (inlined by) __vm_enough_memory at mm/util.c:839 mmap_region+0x1b2/0xa10 do_mmap+0x45c/0x700 vm_mmap_pgoff+0xc0/0x130 ksys_mmap_pgoff+0x6e/0x300 __x64_sys_mmap+0x33/0x40 do_syscall_64+0x91/0xb05 entry_SYSCALL_64_after_hwframe+0x49/0xbe The read is outside percpu_counter::lock critical section which results in a data race. Fix it by adding a READ_ONCE() in percpu_counter_read_positive() which could also service as the existing compiler memory barrier. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Marco Elver <elver@google.com> Link: http://lkml.kernel.org/r/1582302724-2804-1-git-send-email-cai@lca.pw Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-24signal: Extend exec_id to 64bitsEric W. Biederman
commit d1e7fd6462ca9fc76650fbe6ca800e35b24267da upstream. Replace the 32bit exec_id with a 64bit exec_id to make it impossible to wrap the exec_id counter. With care an attacker can cause exec_id wrap and send arbitrary signals to a newly exec'd parent. This bypasses the signal sending checks if the parent changes their credentials during exec. The severity of this problem can been seen that in my limited testing of a 32bit exec_id it can take as little as 19s to exec 65536 times. Which means that it can take as little as 14 days to wrap a 32bit exec_id. Adam Zabrocki has succeeded wrapping the self_exe_id in 7 days. Even my slower timing is in the uptime of a typical server. Which means self_exec_id is simply a speed bump today, and if exec gets noticably faster self_exec_id won't even be a speed bump. Extending self_exec_id to 64bits introduces a problem on 32bit architectures where reading self_exec_id is no longer atomic and can take two read instructions. Which means that is is possible to hit a window where the read value of exec_id does not match the written value. So with very lucky timing after this change this still remains expoiltable. I have updated the update of exec_id on exec to use WRITE_ONCE and the read of exec_id in do_notify_parent to use READ_ONCE to make it clear that there is no locking between these two locations. Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl Fixes: 2.3.23pre2 Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-24thermal: devfreq_cooling: inline all stubs for CONFIG_DEVFREQ_THERMAL=nMartin Blumenstingl
commit 3f5b9959041e0db6dacbea80bb833bff5900999f upstream. When CONFIG_DEVFREQ_THERMAL is disabled all functions except of_devfreq_cooling_register_power() were already inlined. Also inline the last function to avoid compile errors when multiple drivers call of_devfreq_cooling_register_power() when CONFIG_DEVFREQ_THERMAL is not set. Compilation failed with the following message: multiple definition of `of_devfreq_cooling_register_power' (which then lists all usages of of_devfreq_cooling_register_power()) Thomas Zimmermann reported this problem [0] on a kernel config with CONFIG_DRM_LIMA={m,y}, CONFIG_DRM_PANFROST={m,y} and CONFIG_DEVFREQ_THERMAL=n after both, the lima and panfrost drivers gained devfreq cooling support. [0] https://www.spinics.net/lists/dri-devel/msg252825.html Fixes: a76caf55e5b356 ("thermal: Add devfreq cooling") Cc: stable@vger.kernel.org Reported-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200403205133.1101808-1-martin.blumenstingl@googlemail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02locking/atomic, kref: Add kref_read()Peter Zijlstra
commit 2c935bc57221cc2edc787c72ea0e2d30cdcd3d5e upstream. Since we need to change the implementation, stop exposing internals. Provide kref_read() to read the current reference count; typically used for debug messages. Kills two anti-patterns: atomic_read(&kref->refcount) kref->refcount.counter Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> [only add kref_read() to kref.h for stable backports - gregkh] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02vt: switch vt_dont_switch to boolJiri Slaby
commit f400991bf872debffb01c46da882dc97d7e3248e upstream. vt_dont_switch is pure boolean, no need for whole char. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200219073951.16151-6-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02vt: selection, introduce vc_is_selJiri Slaby
commit dce05aa6eec977f1472abed95ccd71276b9a3864 upstream. Avoid global variables (namely sel_cons) by introducing vc_is_sel. It checks whether the parameter is the current selection console. This will help putting sel_cons to a struct later. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200219073951.16151-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02futex: Fix inode life-time issuePeter Zijlstra
commit 8019ad13ef7f64be44d4f892af9c840179009254 upstream. As reported by Jann, ihold() does not in fact guarantee inode persistence. And instead of making it so, replace the usage of inode pointers with a per boot, machine wide, unique inode identifier. This sequence number is global, but shared (file backed) futexes are rare enough that this should not become a performance issue. Reported-by: Jann Horn <jannh@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-02x86/mm: split vmalloc_sync_all()Joerg Roedel
commit 763802b53a427ed3cbd419dbba255c414fdd9e7c upstream. Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the vunmap() code-path. While this change was necessary to maintain correctness on x86-32-pae kernels, it also adds additional cycles for architectures that don't need it. Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported severe performance regressions in micro-benchmarks because it now also calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But the vmalloc_sync_all() implementation on x86-64 is only needed for newly created mappings. To avoid the unnecessary work on x86-64 and to gain the performance back, split up vmalloc_sync_all() into two functions: * vmalloc_sync_mappings(), and * vmalloc_sync_unmappings() Most call-sites to vmalloc_sync_all() only care about new mappings being synchronized. The only exception is the new call-site added in the above mentioned commit. Shile Zhang directed us to a report of an 80% regression in reaim throughput. Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Borislav Petkov <bp@suse.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES] Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-20net: phy: fix MDIO bus PM PHY resumingHeiner Kallweit
[ Upstream commit 611d779af7cad2b87487ff58e4931a90c20b113c ] So far we have the unfortunate situation that mdio_bus_phy_may_suspend() is called in suspend AND resume path, assuming that function result is the same. After the original change this is no longer the case, resulting in broken resume as reported by Geert. To fix this call mdio_bus_phy_may_suspend() in the suspend path only, and let the phy_device store the info whether it was suspended by MDIO bus PM. Fixes: 503ba7c69610 ("net: phy: Avoid multiple suspends") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-11include/linux/bitops.h: introduce BITS_PER_TYPEChris Wilson
commit 9144d75e22cad3c89e6b2ccab551db9ee28d250a upstream. net_dim.h has a rather useful extension to BITS_PER_BYTE to compute the number of bits in a type (BITS_PER_BYTE * sizeof(T)), so promote the macro to bitops.h, alongside BITS_PER_BYTE, for wider usage. Link: http://lkml.kernel.org/r/20180706094458.14116-1-chris@chris-wilson.co.uk Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Andy Gospodarek <gospo@broadcom.com> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [only take the bitops.h portion for stable kernels - gregkh] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-11HID: core: increase HID report buffer size to 8KiBJohan Korsnes
commit 84a4062632462c4320704fcdf8e99e89e94c0aba upstream. We have a HID touch device that reports its opens and shorts test results in HID buffers of size 8184 bytes. The maximum size of the HID buffer is currently set to 4096 bytes, causing probe of this device to fail. With this patch we increase the maximum size of the HID buffer to 8192 bytes, making device probe and acquisition of said buffers succeed. Signed-off-by: Johan Korsnes <jkorsnes@cisco.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Armando Visconti <armando.visconti@st.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28ata: ahci: Add shutdown to freeze hardware resources of ahciPrabhakar Kushwaha
commit 10a663a1b15134a5a714aa515e11425a44d4fdf7 upstream. device_shutdown() called from reboot or power_shutdown expect all devices to be shutdown. Same is true for even ahci pci driver. As no ahci shutdown function is implemented, the ata subsystem always remains alive with DMA & interrupt support. File system related calls should not be honored after device_shutdown(). So defining ahci pci driver shutdown to freeze hardware (mask interrupt, stop DMA engine and free DMA resources). Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28rcu: Use WRITE_ONCE() for assignments to ->pprev for hlist_nullsPaul E. McKenney
[ Upstream commit 860c8802ace14c646864795e057349c9fb2d60ad ] Eric Dumazet supplied a KCSAN report of a bug that forces use of hlist_unhashed_lockless() from sk_unhashed(): ------------------------------------------------------------------------ BUG: KCSAN: data-race in inet_unhash / inet_unhash write to 0xffff8880a69a0170 of 8 bytes by interrupt on cpu 1: __hlist_nulls_del include/linux/list_nulls.h:88 [inline] hlist_nulls_del_init_rcu include/linux/rculist_nulls.h:36 [inline] __sk_nulls_del_node_init_rcu include/net/sock.h:676 [inline] inet_unhash+0x38f/0x4a0 net/ipv4/inet_hashtables.c:612 tcp_set_state+0xfa/0x3e0 net/ipv4/tcp.c:2249 tcp_done+0x93/0x1e0 net/ipv4/tcp.c:3854 tcp_write_err+0x7e/0xc0 net/ipv4/tcp_timer.c:56 tcp_retransmit_timer+0x9b8/0x16d0 net/ipv4/tcp_timer.c:479 tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:599 tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:619 call_timer_fn+0x5f/0x2f0 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0xc0c/0xcd0 kernel/time/timer.c:1786 __do_softirq+0x115/0x33f kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0xbb/0xe0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71 arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x1af/0x280 kernel/sched/idle.c:263 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355 start_secondary+0x208/0x260 arch/x86/kernel/smpboot.c:264 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241 read to 0xffff8880a69a0170 of 8 bytes by interrupt on cpu 0: sk_unhashed include/net/sock.h:607 [inline] inet_unhash+0x3d/0x4a0 net/ipv4/inet_hashtables.c:592 tcp_set_state+0xfa/0x3e0 net/ipv4/tcp.c:2249 tcp_done+0x93/0x1e0 net/ipv4/tcp.c:3854 tcp_write_err+0x7e/0xc0 net/ipv4/tcp_timer.c:56 tcp_retransmit_timer+0x9b8/0x16d0 net/ipv4/tcp_timer.c:479 tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:599 tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:619 call_timer_fn+0x5f/0x2f0 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0xc0c/0xcd0 kernel/time/timer.c:1786 __do_softirq+0x115/0x33f kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0xbb/0xe0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 native_safe_halt+0xe/0x10 arch/x86/kernel/paravirt.c:71 arch_cpu_idle+0x1f/0x30 arch/x86/kernel/process.c:571 default_idle_call+0x1e/0x40 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x1af/0x280 kernel/sched/idle.c:263 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:355 rest_init+0xec/0xf6 init/main.c:452 arch_call_rest_init+0x17/0x37 start_kernel+0x838/0x85e init/main.c:786 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490 x86_64_start_kernel+0x72/0x76 arch/x86/kernel/head64.c:471 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:241 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc6+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ------------------------------------------------------------------------ This commit therefore replaces C-language assignments with WRITE_ONCE() in include/linux/list_nulls.h and include/linux/rculist_nulls.h. Reported-by: Eric Dumazet <edumazet@google.com> # For KCSAN Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-05USB: serial: ir-usb: fix link-speed handlingJohan Hovold
commit 17a0184ca17e288decdca8b2841531e34d49285f upstream. Commit e0d795e4f36c ("usb: irda: cleanup on ir-usb module") added a USB IrDA header with common defines, but mistakingly switched to using the class-descriptor baud-rate bitmask values for the outbound header. This broke link-speed handling for rates above 9600 baud, but a device would also be able to operate at the default 9600 baud until a link-speed request was issued (e.g. using the TCGETS ioctl). Fixes: e0d795e4f36c ("usb: irda: cleanup on ir-usb module") Cc: stable <stable@vger.kernel.org> # 2.6.27 Cc: Felipe Balbi <balbi@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29netfilter: ipset: use bitmap infrastructure completelyKadlecsik József
commit 32c72165dbd0e246e69d16a3ad348a4851afd415 upstream. The bitmap allocation did not use full unsigned long sizes when calculating the required size and that was triggered by KASAN as slab-out-of-bounds read in several places. The patch fixes all of them. Reported-by: syzbot+fabca5cbf5e54f3fe2de@syzkaller.appspotmail.com Reported-by: syzbot+827ced406c9a1d9570ed@syzkaller.appspotmail.com Reported-by: syzbot+190d63957b22ef673ea5@syzkaller.appspotmail.com Reported-by: syzbot+dfccdb2bdb4a12ad425e@syzkaller.appspotmail.com Reported-by: syzbot+df0d0f5895ef1f41a65b@syzkaller.appspotmail.com Reported-by: syzbot+b08bd19bb37513357fd4@syzkaller.appspotmail.com Reported-by: syzbot+53cdd0ec0bbabd53370a@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29bitmap: Add bitmap_alloc(), bitmap_zalloc() and bitmap_free()Andy Shevchenko
commit c42b65e363ce97a828f81b59033c3558f8fa7f70 upstream. A lot of code become ugly because of open coding allocations for bitmaps. Introduce three helpers to allow users be more clear of intention and keep their code neat. Note, due to multiple circular dependencies we may not provide the helpers as inliners. For now we keep them exported and, perhaps, at some point in the future we will sort out header inclusion and inheritance. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-29dmaengine: imx-sdma: fix size check for sdma script_numberRobin Gong
[ Upstream commit bd73dfabdda280fc5f05bdec79b6721b4b2f035f ] Illegal memory will be touch if SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V3 (41) exceed the size of structure sdma_script_start_addrs(40), thus cause memory corrupt such as slob block header so that kernel trap into while() loop forever in slob_free(). Please refer to below code piece in imx-sdma.c: for (i = 0; i < sdma->script_number; i++) if (addr_arr[i] > 0) saddr_arr[i] = addr_arr[i]; /* memory corrupt here */ That issue was brought by commit a572460be9cf ("dmaengine: imx-sdma: Add support for version 3 firmware") because SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V3 (38->41 3 scripts added) not align with script number added in sdma_script_start_addrs(2 scripts). Fixes: a572460be9cf ("dmaengine: imx-sdma: Add support for version 3 firmware") Cc: stable@vger.kernel Link: https://www.spinics.net/lists/arm-kernel/msg754895.html Signed-off-by: Robin Gong <yibin.gong@nxp.com> Reported-by: Jurgen Lambrecht <J.Lambrecht@TELEVIC.com> Link: https://lore.kernel.org/r/1569347584-3478-1-git-send-email-yibin.gong@nxp.com [vkoul: update the patch title] Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-29signal: Allow cifs and drbd to receive their terminating signalsEric W. Biederman
[ Upstream commit 33da8e7c814f77310250bb54a9db36a44c5de784 ] My recent to change to only use force_sig for a synchronous events wound up breaking signal reception cifs and drbd. I had overlooked the fact that by default kthreads start out with all signals set to SIG_IGN. So a change I thought was safe turned out to have made it impossible for those kernel thread to catch their signals. Reverting the work on force_sig is a bad idea because what the code was doing was very much a misuse of force_sig. As the way force_sig ultimately allowed the signal to happen was to change the signal handler to SIG_DFL. Which after the first signal will allow userspace to send signals to these kernel threads. At least for wake_ack_receiver in drbd that does not appear actively wrong. So correct this problem by adding allow_kernel_signal that will allow signals whose siginfo reports they were sent by the kernel through, but will not allow userspace generated signals, and update cifs and drbd to call allow_kernel_signal in an appropriate place so that their thread can receive this signal. Fixing things this way ensures that userspace won't be able to send signals and cause problems, that it is clear which signals the threads are expecting to receive, and it guarantees that nothing else in the system will be affected. This change was partly inspired by similar cifs and drbd patches that added allow_signal. Reported-by: ronnie sahlberg <ronniesahlberg@gmail.com> Reported-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: Steve French <smfrench@gmail.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: David Laight <David.Laight@ACULAB.COM> Fixes: 247bc9470b1e ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes") Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig") Fixes: fee109901f39 ("signal/drbd: Use send_sig not force_sig") Fixes: 3cf5d076fb4d ("signal: Remove task parameter from force_sig") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-29net/mlx5: Fix mlx5_ifc_query_lag_out_bitsMark Zhang
[ Upstream commit ea77388b02270b0af8dc57f668f311235ea068f0 ] Remove the "reserved_at_40" field to match the device specification. Fixes: 84df61ebc69b ("net/mlx5: Add HW interfaces used by LAG") Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-29devres: allow const resource argumentsArnd Bergmann
[ Upstream commit 9dea44c91469512d346e638694c22c30a5273992 ] devm_ioremap_resource() does not currently take 'const' arguments, which results in a warning from the first driver trying to do it anyway: drivers/gpio/gpio-amd-fch.c: In function 'amd_fch_gpio_probe': drivers/gpio/gpio-amd-fch.c:171:49: error: passing argument 2 of 'devm_ioremap_resource' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers] priv->base = devm_ioremap_resource(&pdev->dev, &amd_fch_gpio_iores); ^~~~~~~~~~~~~~~~~~~ Change the prototype to allow it, as there is no real reason not to. Fixes: 9bb2e0452508 ("gpio: amd: Make resource struct const") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20190628150049.1108048-1-arnd@arndb.de Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviwed-By: Enrico Weigelt <info@metux.net> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-23regulator: ab8500: Remove SYSCLKREQ from enum ab8505_regulator_idStephan Gerhold
commit 458ea3ad033fc86e291712ce50cbe60c3428cf30 upstream. Those regulators are not actually supported by the AB8500 regulator driver. There is no ab8500_regulator_info for them and no entry in ab8505_regulator_match. As such, they cannot be registered successfully, and looking them up in ab8505_regulator_match causes an out-of-bounds array read. Fixes: 547f384f33db ("regulator: ab8500: add support for ab8505") Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20191106173125.14496-2-stephan@gerhold.net Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23block: fix an integer overflow in logical block sizeMikulas Patocka
commit ad6bf88a6c19a39fb3b0045d78ea880325dfcf15 upstream. Logical block size has type unsigned short. That means that it can be at most 32768. However, there are architectures that can run with 64k pages (for example arm64) and on these architectures, it may be possible to create block devices with 64k block size. For exmaple (run this on an architecture with 64k pages): Mount will fail with this error because it tries to read the superblock using 2-sector access: device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536 EXT4-fs (dm-0): unable to read superblock This patch changes the logical block size from unsigned short to unsigned int to avoid the overflow. Cc: stable@vger.kernel.org Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23fs/select: avoid clang stack usage warningArnd Bergmann
commit ad312f95d41c9de19313c51e388c4984451c010f upstream. The select() implementation is carefully tuned to put a sensible amount of data on the stack for holding a copy of the user space fd_set, but not too large to risk overflowing the kernel stack. When building a 32-bit kernel with clang, we need a little more space than with gcc, which often triggers a warning: fs/select.c:619:5: error: stack frame size of 1048 bytes in function 'core_sys_select' [-Werror,-Wframe-larger-than=] int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, I experimentally found that for 32-bit ARM, reducing the maximum stack usage by 64 bytes keeps us reliably under the warning limit again. Link: http://lkml.kernel.org/r/20190307090146.1874906-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Eric Dumazet <edumazet@google.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14can: can_dropped_invalid_skb(): ensure an initialized headroom in outgoing ↵Oliver Hartkopp
CAN sk_buffs commit e7153bf70c3496bac00e7e4f395bb8d8394ac0ea upstream. KMSAN sysbot detected a read access to an untinitialized value in the headroom of an outgoing CAN related sk_buff. When using CAN sockets this area is filled appropriately - but when using a packet socket this initialization is missing. The problematic read access occurs in the CAN receive path which can only be triggered when the sk_buff is sent through a (virtual) CAN interface. So we check in the sending path whether we need to perform the missing initializations. Fixes: d3b58c47d330d ("can: replace timestamp as unique skb attribute") Reported-by: syzbot+b02ff0707a97e4e79ebb@syzkaller.appspotmail.com Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-stable <stable@vger.kernel.org> # >= v4.1 Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14kobject: Export kobject_get_unless_zero()Jan Kara
commit c70c176ff8c3ff0ac6ef9a831cd591ea9a66bd1a upstream. Make the function available for outside use and fortify it against NULL kobject. CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12macvlan: do not assume mac_header is set in macvlan_broadcast()Eric Dumazet
[ Upstream commit 96cc4b69581db68efc9749ef32e9cf8e0160c509 ] Use of eth_hdr() in tx path is error prone. Many drivers call skb_reset_mac_header() before using it, but others do not. Commit 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") attempted to fix this generically, but commit d346a3fae3ff ("packet: introduce PACKET_QDISC_BYPASS socket option") brought back the macvlan bug. Lets add a new helper, so that tx paths no longer have to call skb_reset_mac_header() only to get a pointer to skb->data. Hopefully we will be able to revert 6d1ccff62780 ("net: reset mac header in dev_start_xmit()") and save few cycles in transmit fast path. BUG: KASAN: use-after-free in __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline] BUG: KASAN: use-after-free in mc_hash drivers/net/macvlan.c:251 [inline] BUG: KASAN: use-after-free in macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277 Read of size 4 at addr ffff8880a4932401 by task syz-executor947/9579 CPU: 0 PID: 9579 Comm: syz-executor947 Not tainted 5.5.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:145 __get_unaligned_cpu32 include/linux/unaligned/packed_struct.h:19 [inline] mc_hash drivers/net/macvlan.c:251 [inline] macvlan_broadcast+0x547/0x620 drivers/net/macvlan.c:277 macvlan_queue_xmit drivers/net/macvlan.c:520 [inline] macvlan_start_xmit+0x402/0x77f drivers/net/macvlan.c:559 __netdev_start_xmit include/linux/netdevice.h:4447 [inline] netdev_start_xmit include/linux/netdevice.h:4461 [inline] dev_direct_xmit+0x419/0x630 net/core/dev.c:4079 packet_direct_xmit+0x1a9/0x250 net/packet/af_packet.c:240 packet_snd net/packet/af_packet.c:2966 [inline] packet_sendmsg+0x260d/0x6220 net/packet/af_packet.c:2991 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:659 __sys_sendto+0x262/0x380 net/socket.c:1985 __do_sys_sendto net/socket.c:1997 [inline] __se_sys_sendto net/socket.c:1993 [inline] __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1993 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x442639 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 5b 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc13549e08 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000442639 RDX: 000000000000000e RSI: 0000000020000080 RDI: 0000000000000003 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000403bb0 R14: 0000000000000000 R15: 0000000000000000 Allocated by task 9389: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc mm/kasan/common.c:513 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:527 __do_kmalloc mm/slab.c:3656 [inline] __kmalloc+0x163/0x770 mm/slab.c:3665 kmalloc include/linux/slab.h:561 [inline] tomoyo_realpath_from_path+0xc5/0x660 security/tomoyo/realpath.c:252 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822 tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129 security_inode_getattr+0xf2/0x150 security/security.c:1222 vfs_getattr+0x25/0x70 fs/stat.c:115 vfs_statx_fd+0x71/0xc0 fs/stat.c:145 vfs_fstat include/linux/fs.h:3265 [inline] __do_sys_newfstat+0x9b/0x120 fs/stat.c:378 __se_sys_newfstat fs/stat.c:375 [inline] __x64_sys_newfstat+0x54/0x80 fs/stat.c:375 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9389: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:335 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474 kasan_slab_free+0xe/0x10 mm/kasan/common.c:483 __cache_free mm/slab.c:3426 [inline] kfree+0x10a/0x2c0 mm/slab.c:3757 tomoyo_realpath_from_path+0x1a7/0x660 security/tomoyo/realpath.c:289 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_path_perm+0x230/0x430 security/tomoyo/file.c:822 tomoyo_inode_getattr+0x1d/0x30 security/tomoyo/tomoyo.c:129 security_inode_getattr+0xf2/0x150 security/security.c:1222 vfs_getattr+0x25/0x70 fs/stat.c:115 vfs_statx_fd+0x71/0xc0 fs/stat.c:145 vfs_fstat include/linux/fs.h:3265 [inline] __do_sys_newfstat+0x9b/0x120 fs/stat.c:378 __se_sys_newfstat fs/stat.c:375 [inline] __x64_sys_newfstat+0x54/0x80 fs/stat.c:375 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8880a4932000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 1025 bytes inside of 4096-byte region [ffff8880a4932000, ffff8880a4933000) The buggy address belongs to the page: page:ffffea0002924c80 refcount:1 mapcount:0 mapping:ffff8880aa402000 index:0x0 compound_mapcount: 0 raw: 00fffe0000010200 ffffea0002846208 ffffea00028f3888 ffff8880aa402000 raw: 0000000000000000 ffff8880a4932000 0000000100000001 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880a4932300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a4932380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8880a4932400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880a4932480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880a4932500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: b863ceb7ddce ("[NET]: Add macvlan driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12regulator: ab8500: Remove AB8505 USB regulatorStephan Gerhold
commit 99c4f70df3a6446c56ca817c2d0f9c12d85d4e7c upstream. The USB regulator was removed for AB8500 in commit 41a06aa738ad ("regulator: ab8500: Remove USB regulator"). It was then added for AB8505 in commit 547f384f33db ("regulator: ab8500: add support for ab8505"). However, there was never an entry added for it in ab8505_regulator_match. This causes all regulators after it to be initialized with the wrong device tree data, eventually leading to an out-of-bounds array read. Given that it is not used anywhere in the kernel, it seems likely that similar arguments against supporting it exist for AB8505 (it is controlled by hardware). Therefore, simply remove it like for AB8500 instead of adding an entry in ab8505_regulator_match. Fixes: 547f384f33db ("regulator: ab8500: add support for ab8505") Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20191106173125.14496-1-stephan@gerhold.net Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys()Florian Fainelli
commit 84b032dbfdf1c139cd2b864e43959510646975f8 upstream. This reverts commit 6bb86fefa086faba7b60bb452300b76a47cde1a5 ("libahci_platform: Staticize ahci_platform_<en/dis>able_phys()") we are going to need ahci_platform_{enable,disable}_phys() in a subsequent commit for ahci_brcm.c in order to properly control the PHY initialization order. Also make sure the function prototypes are declared in include/linux/ahci_platform.h as a result. Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-12dmaengine: Fix access to uninitialized dma_slave_capsLukas Wunner
commit 53a256a9b925b47c7e67fc1f16ca41561a7b877c upstream. dmaengine_desc_set_reuse() allocates a struct dma_slave_caps on the stack, populates it using dma_get_slave_caps() and then accesses one of its members. However dma_get_slave_caps() may fail and this isn't accounted for, leading to a legitimate warning of gcc-4.9 (but not newer versions): In file included from drivers/spi/spi-bcm2835.c:19:0: drivers/spi/spi-bcm2835.c: In function 'dmaengine_desc_set_reuse': >> include/linux/dmaengine.h:1370:10: warning: 'caps.descriptor_reuse' is used uninitialized in this function [-Wuninitialized] if (caps.descriptor_reuse) { Fix it, thereby also silencing the gcc-4.9 warning. The issue has been present for 4 years but surfaces only now that the first caller of dmaengine_desc_set_reuse() has been added in spi-bcm2835.c. Another user of reusable DMA descriptors has existed for a while in pxa_camera.c, but it sets the DMA_CTRL_REUSE flag directly instead of calling dmaengine_desc_set_reuse(). Nevertheless, tag this commit for stable in case there are out-of-tree users. Fixes: 272420214d26 ("dmaengine: Add DMA_CTRL_REUSE") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v4.3+ Link: https://lore.kernel.org/r/ca92998ccc054b4f2bfd60ef3adbab2913171eac.1575546234.git.lukas@wunner.de Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04tcp/dccp: fix possible race __inet_lookup_established()Eric Dumazet
commit 8dbd76e79a16b45b2ccb01d2f2e08dbf64e71e40 upstream. Michal Kubecek and Firo Yang did a very nice analysis of crashes happening in __inet_lookup_established(). Since a TCP socket can go from TCP_ESTABLISH to TCP_LISTEN (via a close()/socket()/listen() cycle) without a RCU grace period, I should not have changed listeners linkage in their hash table. They must use the nulls protocol (Documentation/RCU/rculist_nulls.txt), so that a lookup can detect a socket in a hash list was moved in another one. Since we added code in commit d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix"), we have to add hlist_nulls_add_tail_rcu() helper. Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Michal Kubecek <mkubecek@suse.cz> Reported-by: Firo Yang <firo.yang@suse.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Link: https://lore.kernel.org/netdev/20191120083919.GH27852@unicorn.suse.cz/ Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> [stable-4.9: we also need to update code in __inet_lookup_listener() and inet6_lookup_listener() which has been removed in 5.0-rc1.] Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04hrtimer: Annotate lockless access to timer->stateEric Dumazet
commit 56144737e67329c9aaed15f942d46a6302e2e3d8 upstream. syzbot reported various data-race caused by hrtimer_is_queued() reading timer->state. A READ_ONCE() is required there to silence the warning. Also add the corresponding WRITE_ONCE() when timer->state is set. In remove_hrtimer() the hrtimer_is_queued() helper is open coded to avoid loading timer->state twice. KCSAN reported these cases: BUG: KCSAN: data-race in __remove_hrtimer / tcp_pacing_check write to 0xffff8880b2a7d388 of 1 bytes by interrupt on cpu 0: __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991 __run_hrtimer kernel/time/hrtimer.c:1496 [inline] __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593 __do_softirq+0x115/0x33f kernel/softirq.c:292 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 read to 0xffff8880b2a7d388 of 1 bytes by task 24652 on cpu 1: tcp_pacing_check net/ipv4/tcp_output.c:2235 [inline] tcp_pacing_check+0xba/0x130 net/ipv4/tcp_output.c:2225 tcp_xmit_retransmit_queue+0x32c/0x5a0 net/ipv4/tcp_output.c:3044 tcp_xmit_recovery+0x7c/0x120 net/ipv4/tcp_input.c:3558 tcp_ack+0x17b6/0x3170 net/ipv4/tcp_input.c:3717 tcp_rcv_established+0x37e/0xf50 net/ipv4/tcp_input.c:5696 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561 sk_backlog_rcv include/net/sock.h:945 [inline] __release_sock+0x135/0x1e0 net/core/sock.c:2435 release_sock+0x61/0x160 net/core/sock.c:2951 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0x9f/0xc0 net/socket.c:657 BUG: KCSAN: data-race in __remove_hrtimer / __tcp_ack_snd_check write to 0xffff8880a3a65588 of 1 bytes by interrupt on cpu 0: __remove_hrtimer+0x52/0x130 kernel/time/hrtimer.c:991 __run_hrtimer kernel/time/hrtimer.c:1496 [inline] __hrtimer_run_queues+0x250/0x600 kernel/time/hrtimer.c:1576 hrtimer_run_softirq+0x10e/0x150 kernel/time/hrtimer.c:1593 __do_softirq+0x115/0x33f kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0xbb/0xe0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0xe6/0x280 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:830 read to 0xffff8880a3a65588 of 1 bytes by task 22891 on cpu 1: __tcp_ack_snd_check+0x415/0x4f0 net/ipv4/tcp_input.c:5265 tcp_ack_snd_check net/ipv4/tcp_input.c:5287 [inline] tcp_rcv_established+0x750/0xf50 net/ipv4/tcp_input.c:5708 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1561 sk_backlog_rcv include/net/sock.h:945 [inline] __release_sock+0x135/0x1e0 net/core/sock.c:2435 release_sock+0x61/0x160 net/core/sock.c:2951 sk_stream_wait_memory+0x3d7/0x7c0 net/core/stream.c:145 tcp_sendmsg_locked+0xb47/0x1f30 net/ipv4/tcp.c:1393 tcp_sendmsg+0x39/0x60 net/ipv4/tcp.c:1434 inet_sendmsg+0x6d/0x90 net/ipv4/af_inet.c:807 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg+0x9f/0xc0 net/socket.c:657 __sys_sendto+0x21f/0x320 net/socket.c:1952 __do_sys_sendto net/socket.c:1964 [inline] __se_sys_sendto net/socket.c:1960 [inline] __x64_sys_sendto+0x89/0xb0 net/socket.c:1960 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 24652 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [ tglx: Added comments ] Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191106174804.74723-1-edumazet@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-04libfdt: define INT32_MAX and UINT32_MAX in libfdt_env.hMasahiro Yamada
[ Upstream commit a8de1304b7df30e3a14f2a8b9709bb4ff31a0385 ] The DTC v1.5.1 added references to (U)INT32_MAX. This is no problem for user-space programs since <stdint.h> defines (U)INT32_MAX along with (u)int32_t. For the kernel space, libfdt_env.h needs to be adjusted before we pull in the changes. In the kernel, we usually use s/u32 instead of (u)int32_t for the fixed-width types. Accordingly, we already have S/U32_MAX for their max values. So, we should not add (U)INT32_MAX to <linux/limits.h> any more. Instead, add them to the in-kernel libfdt_env.h to compile the latest libfdt. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-04fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned longKonstantin Khlebnikov
[ Upstream commit 6fcbcec9cfc7b3c6a2c1f1a23ebacedff7073e0a ] Quota statistics counted as 64-bit per-cpu counter. Reading sums per-cpu fractions as signed 64-bit int, filters negative values and then reports lower half as signed 32-bit int. Result may looks like: fs.quota.allocated_dquots = 22327 fs.quota.cache_hits = -489852115 fs.quota.drops = -487288718 fs.quota.free_dquots = 22083 fs.quota.lookups = -486883485 fs.quota.reads = 22327 fs.quota.syncs = 335064 fs.quota.writes = 3088689 Values bigger than 2^31-1 reported as negative. All counters except "allocated_dquots" and "free_dquots" are monotonic, thus they should be reported as is without filtering negative values. Kernel doesn't have generic helper for 64-bit sysctl yet, let's use at least unsigned long. Link: https://lore.kernel.org/r/157337934693.2078.9842146413181153727.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>