summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2017-11-26cs5536: add support for IDE controller variantAndrey Korolyov
commit 591b6bb605785c12a21e8b07a08a277065b655a5 upstream. Several legacy devices such as Geode-based Cisco ASA appliances and DB800 development board do possess CS5536 IDE controller with different PCI id than existing one. Using pata_generic is not always feasible as at least DB800 requires MSR quirk from pata_cs5536 to be used with vendor firmware. Signed-off-by: Andrey Korolyov <andrey@xdel.ru> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-26fcntl: Don't use ambiguous SIG_POLL si_codesEric W. Biederman
commit d08477aa975e97f1dc64c0ae59cebf98520456ce upstream. We have a weird and problematic intersection of features that when they all come together result in ambiguous siginfo values, that we can not support properly. - Supporting fcntl(F_SETSIG,...) with arbitrary valid signals. - Using positive values for POLL_IN, POLL_OUT, POLL_MSG, ..., etc that imply they are signal specific si_codes and using the aforementioned arbitrary signal to deliver them. - Supporting injection of arbitrary siginfo values for debugging and checkpoint/restore. The result is that just looking at siginfo si_codes of 1 to 6 are ambigious. It could either be a signal specific si_code or it could be a generic si_code. For most of the kernel this is a non-issue but for sending signals with siginfo it is impossible to play back the kernel signals and get the same result. Strictly speaking when the si_code was changed from SI_SIGIO to POLL_IN and friends between 2.2 and 2.4 this functionality was not ambiguous, as only real time signals were supported. Before 2.4 was released the kernel began supporting siginfo with non realtime signals so they could give details of why the signal was sent. The result is that if F_SETSIG is set to one of the signals with signal specific si_codes then user space can not know why the signal was sent. I grepped through a bunch of userspace programs using debian code search to get a feel for how often people choose a signal that results in an ambiguous si_code. I only found one program doing so and it was using SIGCHLD to test the F_SETSIG functionality, and did not appear to be a real world usage. Therefore the ambiguity does not appears to be a real world problem in practice. Remove the ambiguity while introducing the smallest chance of breakage by changing the si_code to SI_SIGIO when signals with signal specific si_codes are targeted. Fixes: v2.3.40 -- Added support for queueing non-rt signals Fixes: v2.3.21 -- Changed the si_code from SI_SIGIO Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-26signal: move the "sig < SIGRTMIN" check into siginmask(sig)Oleg Nesterov
commit 5c8ccefdf46c5f87d87b694c7fbc04941c2c99a5 upstream. All the users of siginmask() must ensure that sig < SIGRTMIN. sig_fatal() doesn't and this is wrong: UBSAN: Undefined behaviour in kernel/signal.c:911:6 shift exponent 32 is too large for 32-bit type 'long unsigned int' the patch doesn't add the neccesary check to sig_fatal(), it moves the check into siginmask() and updates other callers. Link: http://lkml.kernel.org/r/20160517195052.GA15187@redhat.com Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-11perf: Avoid horrible stack usagePeter Zijlstra (Intel)
commit 86038c5ea81b519a8a1fcfcd5e4599aab0cdd119 upstream. Both Linus (most recent) and Steve (a while ago) reported that perf related callbacks have massive stack bloat. The problem is that software events need a pt_regs in order to properly report the event location and unwind stack. And because we could not assume one was present we allocated one on stack and filled it with minimal bits required for operation. Now, pt_regs is quite large, so this is undesirable. Furthermore it turns out that most sites actually have a pt_regs pointer available, making this even more onerous, as the stack space is pointless waste. This patch addresses the problem by observing that software events have well defined nesting semantics, therefore we can use static per-cpu storage instead of on-stack. Linus made the further observation that all but the scheduler callers of perf_sw_event() have a pt_regs available, so we change the regular perf_sw_event() to require a valid pt_regs (where it used to be optional) and add perf_sw_event_sched() for the scheduler. We have a scheduler specific call instead of a more generic _noregs() like construct because we can assume non-recursion from the scheduler and thereby simplify the code further (_noregs would have to put the recursion context call inline in order to assertain which __perf_regs element to use). One last note on the implementation of perf_trace_buf_prepare(); we allow .regs = NULL for those cases where we already have a pt_regs pointer available and do not need another. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Javi Merino <javi.merino@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-11mtd: cfi: reduce stack sizeArnd Bergmann
commit d09957fbb4d0b059b3176b510540df69048ad170 upstream. The cfi_staa_write_buffers function uses a large amount of kernel stack whenever CONFIG_MTD_MAP_BANK_WIDTH_32 is set, and that results in a warning on ARM allmodconfig builds: drivers/mtd/chips/cfi_cmdset_0020.c: In function 'cfi_staa_write_buffers': drivers/mtd/chips/cfi_cmdset_0020.c:651:1: warning: the frame size of 1208 bytes is larger than 1024 bytes [-Wframe-larger-than=] It turns out that this is largely a result of a suboptimal implementation of map_word_andequal(). Replacing this function with a straightforward one reduces the stack size in this function by exactly 200 bytes, shrinks the .text segment for this file from 27648 bytes to 26608 bytes, and makes the warning go away. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-11gpio: drop retval check enforcing from gpiochip_remove()Linus Walleij
commit 14c8a620ba436511b1347c592633befa49535176 upstream. As we start to decomission the return value from gpiochip_remove() the compilers emit warnings due to the function being tagged __must_check. So drop this until we remove the return value altogether. Cc: Abdoulaye Berthe <berthe.ab@gmail.com> Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-11cpumask_set_cpu_local_first => cpumask_local_spread, lamentRusty Russell
commit f36963c9d3f6f415732710da3acdd8608a9fa0e upstream. da91309e0a7e (cpumask: Utility function to set n'th cpu...) created a genuinely weird function. I never saw it before, it went through DaveM. (He only does this to make us other maintainers feel better about our own mistakes.) cpumask_set_cpu_local_first's purpose is say "I need to spread things across N online cpus, choose the ones on this numa node first"; you call it in a loop. It can fail. One of the two callers ignores this, the other aborts and fails the device open. It can fail in two ways: allocating the off-stack cpumask, or through a convoluted codepath which AFAICT can only occur if cpu_online_mask changes. Which shouldn't happen, because if cpu_online_mask can change while you call this, it could return a now-offline cpu anyway. It contains a nonsensical test "!cpumask_of_node(numa_node)". This was drawn to my attention by Geert, who said this causes a warning on Sparc. It sets a single bit in a cpumask instead of returning a cpu number, because that's what the callers want. It could be made more efficient by passing the previous cpu rather than an index, but that would be more invasive to the callers. Fixes: da91309e0a7e8966d916a74cce42ed170fde06bf Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (then rebased) Tested-by: Amir Vadai <amirv@mellanox.com> Acked-by: Amir Vadai <amirv@mellanox.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-11module: fix types of device tables aliasesAndrey Ryabinin
commit 6301939d97d079f0d3dbe71e750f4daf5d39fc33 upstream. MODULE_DEVICE_TABLE() macro used to create aliases to device tables. Normally alias should have the same type as aliased symbol. Device tables are arrays, so they have 'struct type##_device_id[x]' types. Alias created by MODULE_DEVICE_TABLE() will have non-array type - 'struct type##_device_id'. This inconsistency confuses compiler, it could make a wrong assumption about variable's size which leads KASan to produce a false positive report about out of bounds access. For every global variable compiler calls __asan_register_globals() passing information about global variable (address, size, size with redzone, name ...) __asan_register_globals() poison symbols redzone to detect possible out of bounds accesses. When symbol has an alias __asan_register_globals() will be called as for symbol so for alias. Compiler determines size of variable by size of variable's type. Alias and symbol have the same address, so if alias have the wrong size part of memory that actually belongs to the symbol could be poisoned as redzone of alias symbol. By fixing type of alias symbol we will fix size of it, so __asan_register_globals() will not poison valid memory. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-11dm: fix printk() rate limiting codeBart Van Assche
commit 604407890ecf624c2fb41013c82b22aade59b455 upstream. Using the same rate limiting state for different kinds of messages is wrong because this can cause a high frequency message to suppress a report of a low frequency message. Hence use a unique rate limiting state per message type. Fixes: 71a16736a15e ("dm: use local printk ratelimit") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-11dm: convert DM printk macros to pr_<level> macrosJoe Perches
commit d2c3c8dcb5987b8352e82089c79a41b6e17e28d2 upstream. Using pr_<level> is the more common logging style. Standardize style and use new macro DM_FMT. Use no_printk in DMDEBUG macros when CONFIG_DM_DEBUG is not #defined. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-11Clarify (and fix) MAX_LFS_FILESIZE macrosLinus Torvalds
commit 0cc3b0ec23ce4c69e1e890ed2b8d2fa932b14aad upstream. We have a MAX_LFS_FILESIZE macro that is meant to be filled in by filesystems (and other IO targets) that know they are 64-bit clean and don't have any 32-bit limits in their IO path. It turns out that our 32-bit value for that limit was bogus. On 32-bit, the VM layer is limited by the page cache to only 32-bit index values, but our logic for that was confusing and actually wrong. We used to define that value to (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1) which is actually odd in several ways: it limits the index to 31 bits, and then it limits files so that they can't have data in that last byte of a page that has the highest 31-bit index (ie page index 0x7fffffff). Neither of those limitations make sense. The index is actually the full 32 bit unsigned value, and we can use that whole full page. So the maximum size of the file would logically be "PAGE_SIZE << BITS_PER_LONG". However, we do wan tto avoid the maximum index, because we have code that iterates over the page indexes, and we don't want that code to overflow. So the maximum size of a file on a 32-bit host should actually be one page less than the full 32-bit index. So the actual limit is ULONG_MAX << PAGE_SHIFT. That means that we will not actually be using the page of that last index (ULONG_MAX), but we can grow a file up to that limit. The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5 volume. It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one byte less), but the actual true VM limit is one page less than 16TiB. This was invisible until commit c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()"), which started applying that MAX_LFS_FILESIZE limit to block devices too. NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is actually just the offset type itself (loff_t), which is signed. But for clarity, on 64-bit, just use the maximum signed value, and don't make people have to count the number of 'f' characters in the hex constant. So just use LLONG_MAX for the 64-bit case. That was what the value had been before too, just written out as a hex constant. Fixes: c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()") Reported-and-tested-by: Doug Nazar <nazard@nazar.ca> Cc: Andreas Dilger <adilger@dilger.ca> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Dave Kleikamp <shaggy@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.16: 32-bit definition still used PAGE_CACHE_SIZE] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-11mm: migrate: prevent racy access to tlb_flush_pendingNadav Amit
commit 16af97dc5a8975371a83d9e30a64038b48f40a2d upstream. Patch series "fixes of TLB batching races", v6. It turns out that Linux TLB batching mechanism suffers from various races. Races that are caused due to batching during reclamation were recently handled by Mel and this patch-set deals with others. The more fundamental issue is that concurrent updates of the page-tables allow for TLB flushes to be batched on one core, while another core changes the page-tables. This other core may assume a PTE change does not require a flush based on the updated PTE value, while it is unaware that TLB flushes are still pending. This behavior affects KSM (which may result in memory corruption) and MADV_FREE and MADV_DONTNEED (which may result in incorrect behavior). A proof-of-concept can easily produce the wrong behavior of MADV_DONTNEED. Memory corruption in KSM is harder to produce in practice, but was observed by hacking the kernel and adding a delay before flushing and replacing the KSM page. Finally, there is also one memory barrier missing, which may affect architectures with weak memory model. This patch (of 7): Setting and clearing mm->tlb_flush_pending can be performed by multiple threads, since mmap_sem may only be acquired for read in task_numa_work(). If this happens, tlb_flush_pending might be cleared while one of the threads still changes PTEs and batches TLB flushes. This can lead to the same race between migration and change_protection_range() that led to the introduction of tlb_flush_pending. The result of this race was data corruption, which means that this patch also addresses a theoretically possible data corruption. An actual data corruption was not observed, yet the race was was confirmed by adding assertion to check tlb_flush_pending is not set by two threads, adding artificial latency in change_protection_range() and using sysctl to reduce kernel.numa_balancing_scan_delay_ms. Link: http://lkml.kernel.org/r/20170802000818.4760-2-namit@vmware.com Fixes: 20841405940e ("mm: fix TLB flush race between migration, and change_protection_range") Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Russell King <linux@armlinux.org.uk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.16: - Drop change to dump_mm() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-11net/mlx4_en: Fix wrong indication of Wake-on-LAN (WoL) supportInbar Karmy
commit c994f778bb1cca8ebe7a4e528cefec233e93b5cc upstream. Currently when WoL is supported but disabled, ethtool reports: "Supports Wake-on: d". Fix the indication of Wol support, so that the indication remains "g" all the time if the NIC supports WoL. Tested: As accepted, when NIC supports WoL- ethtool reports: Supports Wake-on: g Wake-on: d when NIC doesn't support WoL- ethtool reports: Supports Wake-on: d Wake-on: d Fixes: 14c07b1358ed ("mlx4: Wake on LAN support") Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-11-11workqueue: implicit ordered attribute should be overridableTejun Heo
commit 0a94efb5acbb6980d7c9ab604372d93cd507e4d8 upstream. 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") automatically enabled ordered attribute for unbound workqueues w/ max_active == 1. Because ordered workqueues reject max_active and some attribute changes, this implicit ordered mode broke cases where the user creates an unbound workqueue w/ max_active == 1 and later explicitly changes the related attributes. This patch distinguishes explicit and implicit ordered setting and overrides from attribute changes if implict. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flagsZefan Li
commit 2ad654bc5e2b211e92f66da1d819e47d79a866f0 upstream. When we change cpuset.memory_spread_{page,slab}, cpuset will flip PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset. This should be done using atomic bitops, but currently we don't, which is broken. Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened when one thread tried to clear PF_USED_MATH while at the same time another thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on the same task. Here's the full report: https://lkml.org/lkml/2014/9/19/230 To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags. v4: - updated mm/slab.c. (Fengguang Wu) - updated Documentation. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Fixes: 950592f7b991 ("cpusets: update tasks' page/slab spread flags in time") Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12sched: add macros to define bitops for task atomic flagsZefan Li
commit e0e5070b20e01f0321f97db4e4e174f3f6b49e50 upstream. This will simplify code when we add new flags. v3: - Kees pointed out that no_new_privs should never be cleared, so we shouldn't define task_clear_no_new_privs(). we define 3 macros instead of a single one. v2: - updated scripts/tags.sh, suggested by Peter Cc: Ingo Molnar <mingo@kernel.org> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12sched: fix confusing PFA_NO_NEW_PRIVS constantZefan Li
commit a2b86f772227bcaf962c8b134f8d187046ac5f0e upstream. Commit 1d4457f99928 ("sched: move no_new_privs into new atomic flags") defined PFA_NO_NEW_PRIVS as hexadecimal value, but it is confusing because it is used as bit number. Redefine it as decimal bit number. Note this changes the bit position of PFA_NOW_NEW_PRIVS from 1 to 0. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Kees Cook <keescook@chromium.org> [ lizf: slightly modified subject and changelog ] Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12sched: move no_new_privs into new atomic flagsKees Cook
commit 1d4457f99928a968767f6405b4a1f50845aa15fd upstream. Since seccomp transitions between threads requires updates to the no_new_privs flag to be atomic, the flag must be part of an atomic flag set. This moves the nnp flag into a separate task field, and introduces accessors. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12PCI: Add Netronome NFP4000 PF device IDSimon Horman
commit 69874ec233871a62e1bc8c89e643993af93a8630 upstream. Add the device ID for the PF of the NFP4000. The device ID for the VF, 0x6003, is already present as PCI_DEVICE_ID_NETRONOME_NFP6000_VF. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12PCI: Add Netronome vendor and device IDsJason S. McMullan
commit a755e169031dac9ebaed03302c4921687c271d62 upstream. Device IDs for the Netronome NFP3200, NFP3240, NFP6000, and NFP6000 SR-IOV devices. Signed-off-by: Jason S. McMullan <jason.mcmullan@netronome.com> [simon: edited changelog] Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12net: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUGJason A. Donenfeld
commit daddef76c3deaaa7922f9d7b18edbf0a061215c3 upstream. The implementation of net_dbg_ratelimited in the CONFIG_DYNAMIC_DEBUG case was added with 2c94b5373 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case"). The implementation strategy was to take the usual definition of the dynamic_pr_debug macro, but alter it by adding a call to "net_ratelimit()" in the if statement. This is, in fact, the correct approach. However, while doing this, the author of the commit forgot to surround fmt by pr_fmt, resulting in unprefixed log messages appearing in the console. So, this commit adds back the pr_fmt(fmt) invocation, making net_dbg_ratelimited properly consistent across DEBUG, no DEBUG, and DYNAMIC_DEBUG cases, and bringing parity with the behavior of dynamic_pr_debug as well. Fixes: 2c94b5373 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Tim Bingham <tbingham@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG caseTim Bingham
commit 2c94b53738549d81dc7464a32117d1f5112c64d3 upstream. Prior to commit d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when !DEBUG") the implementation of net_dbg_ratelimited() was buggy for both the DEBUG and CONFIG_DYNAMIC_DEBUG cases. The bug was that net_ratelimit() was being called and, despite returning true, nothing was being printed to the console. This resulted in messages like the following - "net_ratelimit: %d callbacks suppressed" with no other output nearby. After commit d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when !DEBUG") the bug is fixed for the DEBUG case. However, there's no output at all for CONFIG_DYNAMIC_DEBUG case. This patch restores debug output (if enabled) for the CONFIG_DYNAMIC_DEBUG case. Add a definition of net_dbg_ratelimited() for the CONFIG_DYNAMIC_DEBUG case. The implementation takes care to check that dynamic debugging is enabled before calling net_ratelimit(). Fixes: d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when !DEBUG") Signed-off-by: Tim Bingham <tbingham@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12net_dbg_ratelimited: turn into no-op when !DEBUGJason A. Donenfeld
commit d92cff89a0c80e7e49796366e441d97f07b5d321 upstream. The pr_debug family of functions turns into a no-op when -DDEBUG is not specified, opting instead to call "no_printk", which gets compiled to a no-op (but retains gcc's nice warnings about printf-style arguments). The problem with net_dbg_ratelimited is that it is defined to be a variant of net_ratelimited_function, which expands to essentially: if (net_ratelimit()) pr_debug(fmt, ...); When DEBUG is not defined, then this becomes, if (net_ratelimit()) ; This seems benign, except it isn't. Firstly, there's the obvious overhead of calling net_ratelimit needlessly, which does quite some book keeping for the rate limiting. Given that the pr_debug and net_dbg_ratelimited family of functions are sprinkled liberally through performance critical code, with developers assuming they'll be compiled out to a no-op most of the time, we certainly do not want this needless book keeping. Secondly, and most visibly, even though no debug message is printed when DEBUG is not defined, if there is a flood of invocations, dmesg winds up peppered with messages such as "net_ratelimit: 320 callbacks suppressed". This is because our aforementioned net_ratelimit() function actually prints this text in some circumstances. It's especially odd to see this when there isn't any other accompanying debug message. So, in sum, it doesn't make sense to have this function's current behavior, and instead it should match what every other debug family of functions in the kernel does with !DEBUG -- nothing. This patch replaces calls to net_dbg_ratelimited when !DEBUG with no_printk, keeping with the idiom of all the other debug print helpers. Also, though not strictly neccessary, it guards the call with an if (0) so that all evaluation of any arguments are sure to be compiled out. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12vfio: New external user group/file matchAlex Williamson
commit 5d6dee80a1e94cc284d03e06d930e60e8d3ecf7d upstream. At the point where the kvm-vfio pseudo device wants to release its vfio group reference, we can't always acquire a new reference to make that happen. The group can be in a state where we wouldn't allow a new reference to be added. This new helper function allows a caller to match a file to a group to facilitate this. Given a file and group, report if they match. Thus the caller needs to already have a group reference to match to the file. This allows the deletion of a group without acquiring a new reference. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12usb: Fix typo in the definition of Endpoint[out]RequestBenjamin Herrenschmidt
commit 7cf916bd639bd26db7214f2205bccdb4b9306256 upstream. The current definition is wrong. This breaks my upcoming Aspeed virtual hub driver. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-10-12sched/fair, cpumask: Export for_each_cpu_wrap()Peter Zijlstra
commit c743f0a5c50f2fcbc628526279cfa24f3dabe182 upstream. More users for for_each_cpu_wrap() have appeared. Promote the construct to generic cpumask interface. The implementation is slightly modified to reduce arguments. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Lauro Ramos Venancio <lvenanci@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: lwang@redhat.com Link: http://lkml.kernel.org/r/20170414122005.o35me2h5nowqkxbv@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.16: there's no old version of the function to delete] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-09-15ptrace: use fsuid, fsgid, effective creds for fs access checksJann Horn
commit caaee6234d05a58c5b4d05e7bf766131b810a657 upstream. By checking the effective credentials instead of the real UID / permitted capabilities, ensure that the calling process actually intended to use its credentials. To ensure that all ptrace checks use the correct caller credentials (e.g. in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS flag), use two new flags and require one of them to be set. The problem was that when a privileged task had temporarily dropped its privileges, e.g. by calling setreuid(0, user_uid), with the intent to perform following syscalls with the credentials of a user, it still passed ptrace access checks that the user would not be able to pass. While an attacker should not be able to convince the privileged task to perform a ptrace() syscall, this is a problem because the ptrace access check is reused for things in procfs. In particular, the following somewhat interesting procfs entries only rely on ptrace access checks: /proc/$pid/stat - uses the check for determining whether pointers should be visible, useful for bypassing ASLR /proc/$pid/maps - also useful for bypassing ASLR /proc/$pid/cwd - useful for gaining access to restricted directories that contain files with lax permissions, e.g. in this scenario: lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar drwx------ root root /root drwxr-xr-x root root /root/foobar -rw-r--r-- root root /root/foobar/secret Therefore, on a system where a root-owned mode 6755 binary changes its effective credentials as described and then dumps a user-specified file, this could be used by an attacker to reveal the memory layout of root's processes or reveal the contents of files he is not allowed to access (through /proc/$pid/cwd). [akpm@linux-foundation.org: fix warning] Signed-off-by: Jann Horn <jann@thejh.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.16: - Update mm_access() calls in fs/proc/task_{,no}mmu.c too - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-09-15srcu: Allow use of Classic SRCU from both process and interrupt contextPaolo Bonzini
commit 1123a6041654e8f889014659593bad4168e542c2 upstream. Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting down a guest running iperf on a VFIO assigned device. This happens because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt context, while a worker thread does the same inside kvm_set_irq(). If the interrupt happens while the worker thread is executing __srcu_read_lock(), updates to the Classic SRCU ->lock_count[] field or the Tree SRCU ->srcu_lock_count[] field can be lost. The docs say you are not supposed to call srcu_read_lock() and srcu_read_unlock() from irq context, but KVM interrupt injection happens from (host) interrupt context and it would be nice if SRCU supported the use case. KVM is using SRCU here not really for the "sleepable" part, but rather due to its IPI-free fast detection of grace periods. It is therefore not desirable to switch back to RCU, which would effectively revert commit 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING", 2014-01-16). However, the docs are overly conservative. You can have an SRCU instance only has users in irq context, and you can mix process and irq context as long as process context users disable interrupts. In addition, __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and Classic SRCU. For those two implementations, only srcu_read_lock() is unsafe. When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(), in commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments. Therefore it kept __this_cpu_inc(), with preempt_disable/enable in the caller. Tree SRCU however only does one increment, so on most architectures it is more efficient for __srcu_read_lock() to use this_cpu_inc(), and any performance differences appear to be down in the noise. Fixes: 719d93cd5f5c ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING") Reported-by: Linu Cherian <linuc.decode@gmail.com> Suggested-by: Linu Cherian <linuc.decode@gmail.com> Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [bwh: Backported to 3.16: __srcu_read_lock() still updates two different counters. So follow what _this_cpu_generic_to_op() does and use raw_local_irq_{save,restore}() and raw_cpu_ptr().] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-09-15rcu: Move preemption disabling out of __srcu_read_lock()Paul E. McKenney
commit 49f5903b473c5f63f3b57856d1bd4593db0a2eef upstream. Currently, __srcu_read_lock() cannot be invoked from restricted environments because it contains calls to preempt_disable() and preempt_enable(), both of which can invoke lockdep, which is a bad idea in some restricted execution modes. This commit therefore moves the preempt_disable() and preempt_enable() from __srcu_read_lock() to srcu_read_lock(). It also inserts the preempt_disable() and preempt_enable() around the call to __srcu_read_lock() in do_exit(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> [bwh: Backported to 3.16: - Drop changes in do_exit() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-09-15tracing/kprobes: Enforce kprobes teardown after testingThomas Gleixner
commit 30e7d894c1478c88d50ce94ddcdbd7f9763d9cdd upstream. Enabling the tracer selftest triggers occasionally the warning in text_poke(), which warns when the to be modified page is not marked reserved. The reason is that the tracer selftest installs kprobes on functions marked __init for testing. These probes are removed after the tests, but that removal schedules the delayed kprobes_optimizer work, which will do the actual text poke. If the work is executed after the init text is freed, then the warning triggers. The bug can be reproduced reliably when the work delay is increased. Flush the optimizer work and wait for the optimizing/unoptimizing lists to become empty before returning from the kprobes tracer selftest. That ensures that all operations which were queued due to the probes removal have completed. Link: http://lkml.kernel.org/r/20170516094802.76a468bb@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 6274de498 ("kprobes: Support delayed unoptimizing") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-08-26net/mlx4: Fix the check in attaching steering rulesTalat Batheesh
commit 6dc06c08bef1c746ff8da33dab677cfbacdcad32 upstream. Our previous patch (cited below) introduced a regression for RAW Eth QPs. Fix it by checking if the QP number provided by user-space exists, hence allowing steering rules to be added for valid QPs only. Fixes: 89c557687a32 ("net/mlx4_en: Avoid adding steering rules with invalid ring") Reported-by: Or Gerlitz <gerlitz.or@gmail.com> Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-08-26dentry name snapshotsAl Viro
commit 49d31c2f389acfe83417083e1208422b4091cd9e upstream. take_dentry_name_snapshot() takes a safe snapshot of dentry name; if the name is a short one, it gets copied into caller-supplied structure, otherwise an extra reference to external name is grabbed (those are never modified). In either case the pointer to stable string is stored into the same structure. dentry must be held by the caller of take_dentry_name_snapshot(), but may be freely dropped afterwards - the snapshot will stay until destroyed by release_dentry_name_snapshot(). Intended use: struct name_snapshot s; take_dentry_name_snapshot(&s, dentry); ... access s.name ... release_dentry_name_snapshot(&s); Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name to pass down with event. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [carnil: backport 4.9: adjust context] [bwh: Backported to 3.16: - External names are not ref-counted, so copy them - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-08-26ima: pass 'opened' flag to identify newly created filesDmitry Kasatkin
commit 3034a146820c26fe6da66a45f6340fe87fe0983a upstream. Empty files and missing xattrs do not guarantee that a file was just created. This patch passes FILE_CREATED flag to IMA to reliably identify new files. Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> [bwh: Backported to 3.16: drop change to ima_fw_from_file()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-07-18include/linux/kernel.h: change abs() macro so it uses consistent return typeMichal Nazarewicz
commit 8f57e4d930d48217268315898212518d4d3e0773 upstream. Rewrite abs() so that its return type does not depend on the architecture and no unexpected type conversion happen inside of it. The only conversion is from unsigned to signed type. char is left as a return type but treated as a signed type regradless of it's actual signedness. With the old version, int arguments were promoted to long and depending on architecture a long argument might result in s64 or long return type (which may or may not be the same). This came after some back and forth with Nicolas. The current macro has different return type (for the same input type) depending on architecture which might be midly iritating. An alternative version would promote to int like so: #define abs(x) __abs_choose_expr(x, long long, \ __abs_choose_expr(x, long, \ __builtin_choose_expr( \ sizeof(x) <= sizeof(int), \ ({ int __x = (x); __x<0?-__x:__x; }), \ ((void)0)))) I have no preference but imagine Linus might. :] Nicolas argument against is that promoting to int causes iconsistent behaviour: int main(void) { unsigned short a = 0, b = 1, c = a - b; unsigned short d = abs(a - b); unsigned short e = abs(c); printf("%u %u\n", d, e); // prints: 1 65535 } Then again, no sane person expects consistent behaviour from C integer arithmetic. ;) Note: __builtin_types_compatible_p(unsigned char, char) is always false, and __builtin_types_compatible_p(signed char, char) is also always false. Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.16: adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-07-18kernel.h: make abs() work with 64-bit typesMichal Nazarewicz
commit c8299cb605b27dd5a49f7a69e48fd23e5a206298 upstream. For 64-bit arguments, the abs macro casts it to an int which leads to lost precision and may cause incorrect results. To deal with 64-bit types abs64 macro has been introduced but still there are places where abs macro is used incorrectly. To deal with the problem, expand abs macro such that it operates on s64 type when dealing with 64-bit types while still returning long when dealing with smaller types. This fixes one known bug (per John): The internal clocksteering done for fine-grained error correction uses a : logarithmic approximation, so any time adjtimex() adjusts the clock : steering, timekeeping_freqadjust() quickly approximates the correct clock : frequency over a series of ticks. : : Unfortunately, the logic in timekeeping_freqadjust(), introduced in commit : dc491596f639438 (Rework frequency adjustments to work better w/ nohz), : used the abs() function with a s64 error value to calculate the size of : the approximated adjustment to be made. : : Per include/linux/kernel.h: "abs() should not be used for 64-bit types : (s64, u64, long long) - use abs64()". : : Thus on 32-bit platforms, this resulted in the clocksteering to take a : quite dampended random walk trying to converge on the proper frequency, : which caused the adjustments to be made much slower then intended (most : easily observed when large adjustments are made). Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Reported-by: John Stultz <john.stultz@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-07-18KVM: kvm_io_bus_unregister_dev() should never failDavid Hildenbrand
commit 90db10434b163e46da413d34db8d0e77404cc645 upstream. No caller currently checks the return value of kvm_io_bus_unregister_dev(). This is evil, as all callers silently go on freeing their device. A stale reference will remain in the io_bus, getting at least used again, when the iobus gets teared down on kvm_destroy_vm() - leading to use after free errors. There is nothing the callers could do, except retrying over and over again. So let's simply remove the bus altogether, print an error and make sure no one can access this broken bus again (returning -ENOMEM on any attempt to access it). Fixes: e93f8a0f821e ("KVM: convert io_bus to SRCU") Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.16: - Drop changes to kvm_io_bus_get_dev() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-07-18bpf: try harder on clones when writing into skbDaniel Borkmann
commit 3697649ff29e0f647565eed04b27a7779c646a22 upstream. When we're dealing with clones and the area is not writeable, try harder and get a copy via pskb_expand_head(). Replace also other occurences in tc actions with the new skb_try_make_writable(). Reported-by: Ashhad Sheikh <ashhadsheikh394@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: drop changes to bpf; only tc actions need fixing] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-07-18usb-core: Add LINEAR_FRAME_INTR_BINTERVAL USB quirkSamuel Thibault
commit 3243367b209faed5c320a4e5f9a565ee2a2ba958 upstream. Some USB 2.0 devices erroneously report millisecond values in bInterval. The generic config code manages to catch most of them, but in some cases it's not completely enough. The case at stake here is a USB 2.0 braille device, which wants to announce 10ms and thus sets bInterval to 10, but with the USB 2.0 computation that yields to 64ms. It happens that one can type fast enough to reach this interval and get the device buffers overflown, leading to problematic latencies. The generic config code does not catch this case because the 64ms is considered a sane enough value. This change thus adds a USB_QUIRK_LINEAR_FRAME_INTR_BINTERVAL quirk to mark devices which actually report milliseconds in bInterval, and marks Vario Ultra devices as needing it. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-07-18net: phy: Do not perform software reset for Generic PHYFlorian Fainelli
commit 0878fff1f42c18e448ab5b8b4f6a3eb32365b5b6 upstream. The Generic PHY driver is a catch-all PHY driver and it should preserve whatever prior initialization has been done by boot loader or firmware agents. For specific PHY device configuration it is expected that a specialized PHY driver would take over that role. Resetting the generic PHY was a bad idea that has lead to several complaints and downstream workarounds e.g: in OpenWrt/LEDE so restore the behavior prior to 87aa9f9c61ad ("net: phy: consolidate PHY reset in phy_init_hw()"). Reported-by: Felix Fietkau <nbd@nbd.name> Fixes: 87aa9f9c61ad ("net: phy: consolidate PHY reset in phy_init_hw()") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-07-02mm: larger stack guard gap, between vmasHugh Dickins
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream. Stack guard page is a useful feature to reduce a risk of stack smashing into a different mapping. We have been using a single page gap which is sufficient to prevent having stack adjacent to a different mapping. But this seems to be insufficient in the light of the stack usage in userspace. E.g. glibc uses as large as 64kB alloca() in many commonly used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX] which is 256kB or stack strings with MAX_ARG_STRLEN. This will become especially dangerous for suid binaries and the default no limit for the stack size limit because those applications can be tricked to consume a large portion of the stack and a single glibc call could jump over the guard page. These attacks are not theoretical, unfortunatelly. Make those attacks less probable by increasing the stack guard gap to 1MB (on systems with 4k pages; but make it depend on the page size because systems with larger base pages might cap stack allocations in the PAGE_SIZE units) which should cover larger alloca() and VLA stack allocations. It is obviously not a full fix because the problem is somehow inherent, but it should reduce attack space a lot. One could argue that the gap size should be configurable from userspace, but that can be done later when somebody finds that the new 1MB is wrong for some special case applications. For now, add a kernel command line option (stack_guard_gap) to specify the stack gap size (in page units). Implementation wise, first delete all the old code for stack guard page: because although we could get away with accounting one extra page in a stack vma, accounting a larger gap can break userspace - case in point, a program run with "ulimit -S -v 20000" failed when the 1MB gap was counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK and strict non-overcommit mode. Instead of keeping gap inside the stack vma, maintain the stack guard gap as a gap between vmas: using vm_start_gap() in place of vm_start (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few places which need to respect the gap - mainly arch_get_unmapped_area(), and and the vma tree's subtree_gap support for that. Original-patch-by: Oleg Nesterov <oleg@redhat.com> Original-patch-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [Hugh Dickins: Backported to 3.16] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-06-05locking/static_keys: Add static_key_{en,dis}able() helpersPeter Zijlstra
commit e33886b38cc82a9fc3b2d655dfc7f50467594138 upstream. Add two helpers to make it easier to treat the refcount as boolean. Suggested-by: Jason Baron <jasonbaron0@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-06-05net/mlx4: && vs & typoDan Carpenter
commit b2d0fe35471d1a71471f99147ffb5986bd60e744 upstream. Bitwise & was obviously intended here. Fixes: 745d8ae4622c ("net/mlx4: Spoofcheck and zero MAC can't coexist") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-06-05net/mlx4: Spoofcheck and zero MAC can't coexistEugenia Emantayev
commit 745d8ae4622c6808b22e33a944c7decb30074be4 upstream. Spoofcheck can't be enabled if VF MAC is zero. Vice versa, can't zero MAC if spoofcheck is on. Fixes: 8f7ba3ca12f6 ('net/mlx4: Add set VF mac address support') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-06-05mm: fix <linux/pagemap.h> stray kernel-doc notationRandy Dunlap
commit 083fb8edda0487d192e8c117f625563b920cf7a4 upstream. Delete stray (second) function description in find_lock_page() kernel-doc notation. Note: scripts/kernel-doc just ignores the second function description. Fixes: 2457aec63745e ("mm: non-atomically mark page accessed during page cache allocation where possible") Link: http://lkml.kernel.org/r/b037e9a3-516c-ec02-6c8e-fa5479747ba6@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-06-05nlm: Ensure callback code also checks that the files matchTrond Myklebust
commit 251af29c320d86071664f02c76f0d063a19fefdf upstream. It is not sufficient to just check that the lock pids match when granting a callback, we also need to ensure that we're granting the callback on the right file. Reported-by: Pankaj Singh <psingh.ait@gmail.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-06-05locking/ww_mutex: Fix compilation of __WW_MUTEX_INITIALIZERChris Wilson
commit af2e859edd477fa1ea3d1d106f41a595cff3d162 upstream. From conflicting macro parameters, passing the wrong name to __MUTEX_INITIALIZER and a stray '\', #define __WW_MUTEX_INITIALIZER was very unhappy. One unnecessary change was to choose to pass &ww_class instead of implicitly taking the address of the class within the macro. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 1b375dc30710 ("mutex: Move ww_mutex definitions to ww_mutex.h") Link: http://lkml.kernel.org/r/20161201114711.28697-2-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-04-04vfs: Commit to never having exectuables on proc and sysfs.Eric W. Biederman
commit 22f6b4d34fcf039c63a94e7670e0da24f8575a5a upstream. Today proc and sysfs do not contain any executable files. Several applications today mount proc or sysfs without noexec and nosuid and then depend on there being no exectuables files on proc or sysfs. Having any executable files show on proc or sysfs would cause a user space visible regression, and most likely security problems. Therefore commit to never allowing executables on proc and sysfs by adding a new flag to mark them as filesystems without executables and enforce that flag. Test the flag where MNT_NOEXEC is tested today, so that the only user visible effect will be that exectuables will be treated as if the execute bit is cleared. The filesystems proc and sysfs do not currently incoporate any executable files so this does not result in any user visible effects. This makes it unnecessary to vet changes to proc and sysfs tightly for adding exectuable files or changes to chattr that would modify existing files, as no matter what the individual file say they will not be treated as exectuable files by the vfs. Not having to vet changes to closely is important as without this we are only one proc_create call (or another goof up in the implementation of notify_change) from having problematic executables on proc. Those mistakes are all too easy to make and would create a situation where there are security issues or the assumptions of some program having to be broken (and cause userspace regressions). Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> [bwh: Backported to 3.16: we don't have super_block::s_iflags; use file_system_type::fs_flags instead] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-04-04blk: rq_data_dir() should not return a booleanLinus Torvalds
commit 10fbd36e362a0f367e34a7cd876a81295d8fc5ca upstream. rq_data_dir() returns either READ or WRITE (0 == READ, 1 == WRITE), not a boolean value. Now, admittedly the "!= 0" doesn't really change the value (0 stays as zero, 1 stays as one), but it's not only redundant, it confuses gcc, and causes gcc to warn about the construct switch (rq_data_dir(req)) { case READ: ... case WRITE: ... that we have in a few drivers. Now, the gcc warning is silly and stupid (it seems to warn not about the switch value having a different type from the case statements, but about _any_ boolean switch value), but in this case the code itself is silly and stupid too, so let's just change it, and get rid of warnings like this: drivers/block/hd.c: In function ‘hd_request’: drivers/block/hd.c:630:11: warning: switch condition has boolean value [-Wswitch-bool] switch (rq_data_dir(req)) { The odd '!= 0' came in when "cmd_flags" got turned into a "u64" in commit 5953316dbf90 ("block: make rq->cmd_flags be 64-bit") and is presumably because the old code (that just did a logical 'and' with 1) would then end up making the type of rq_data_dir() be u64 too. But if we want to retain the old regular integer type, let's just cast the result to 'int' rather than use that rather odd '!= 0'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Arnd Bergmann <arnd@arndb.de>
2017-03-16can: Fix kernel panic at security_sock_rcv_skbEric Dumazet
[ Upstream commit f1712c73714088a7252d276a57126d56c7d37e64 ] Zhang Yanmin reported crashes [1] and provided a patch adding a synchronize_rcu() call in can_rx_unregister() The main problem seems that the sockets themselves are not RCU protected. If CAN uses RCU for delivery, then sockets should be freed only after one RCU grace period. Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's ease stable backports with the following fix instead. [1] BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81495e25>] selinux_socket_sock_rcv_skb+0x65/0x2a0 Call Trace: <IRQ> [<ffffffff81485d8c>] security_sock_rcv_skb+0x4c/0x60 [<ffffffff81d55771>] sk_filter+0x41/0x210 [<ffffffff81d12913>] sock_queue_rcv_skb+0x53/0x3a0 [<ffffffff81f0a2b3>] raw_rcv+0x2a3/0x3c0 [<ffffffff81f06eab>] can_rcv_filter+0x12b/0x370 [<ffffffff81f07af9>] can_receive+0xd9/0x120 [<ffffffff81f07beb>] can_rcv+0xab/0x100 [<ffffffff81d362ac>] __netif_receive_skb_core+0xd8c/0x11f0 [<ffffffff81d36734>] __netif_receive_skb+0x24/0xb0 [<ffffffff81d37f67>] process_backlog+0x127/0x280 [<ffffffff81d36f7b>] net_rx_action+0x33b/0x4f0 [<ffffffff810c88d4>] __do_softirq+0x184/0x440 [<ffffffff81f9e86c>] do_softirq_own_stack+0x1c/0x30 <EOI> [<ffffffff810c76fb>] do_softirq.part.18+0x3b/0x40 [<ffffffff810c8bed>] do_softirq+0x1d/0x20 [<ffffffff81d30085>] netif_rx_ni+0xe5/0x110 [<ffffffff8199cc87>] slcan_receive_buf+0x507/0x520 [<ffffffff8167ef7c>] flush_to_ldisc+0x21c/0x230 [<ffffffff810e3baf>] process_one_work+0x24f/0x670 [<ffffffff810e44ed>] worker_thread+0x9d/0x6f0 [<ffffffff810e4450>] ? rescuer_thread+0x480/0x480 [<ffffffff810ebafc>] kthread+0x12c/0x150 [<ffffffff81f9ccef>] ret_from_fork+0x3f/0x70 Reported-by: Zhang Yanmin <yanmin.zhang@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2017-03-16tcp: fix tcp_fastopen unaligned access complaints on sparcShannon Nelson
[ Upstream commit 003c941057eaa868ca6fedd29a274c863167230d ] Fix up a data alignment issue on sparc by swapping the order of the cookie byte array field with the length field in struct tcp_fastopen_cookie, and making it a proper union to clean up the typecasting. This addresses log complaints like these: log_unaligned: 113 callbacks suppressed Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360 Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360 Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360 Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360 Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360 Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>