summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2019-01-26sunrpc: use-after-free in svc_process_common()Vasily Averin
commit d4b09acf924b84bae77cad090a9d108e70b43643 upstream. if node have NFSv41+ mounts inside several net namespaces it can lead to use-after-free in svc_process_common() svc_process_common() /* Setup reply header */ rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE svc_process_common() can use incorrect rqstp->rq_xprt, its caller function bc_svc_process() takes it from serv->sv_bc_xprt. The problem is that serv is global structure but sv_bc_xprt is assigned per-netnamespace. According to Trond, the whole "let's set up rqstp->rq_xprt for the back channel" is nothing but a giant hack in order to work around the fact that svc_process_common() uses it to find the xpt_ops, and perform a couple of (meaningless for the back channel) tests of xpt_flags. All we really need in svc_process_common() is to be able to run rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr() Bruce J Fields points that this xpo_prep_reply_hdr() call is an awfully roundabout way just to do "svc_putnl(resv, 0);" in the tcp case. This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(), now it calls svc_process_common() with rqstp->rq_xprt = NULL. To adjust reply header svc_process_common() just check rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case. To handle rqstp->rq_xprt = NULL case in functions called from svc_process_common() patch intruduces net namespace pointer svc_rqst->rq_bc_net and adjust SVC_NET() definition. Some other function was also adopted to properly handle described case. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Cc: stable@vger.kernel.org Fixes: 23c20ecd4475 ("NFS: callback up - users counting cleanup") Signed-off-by: J. Bruce Fields <bfields@redhat.com> v2: - added lost extern svc_tcp_prep_reply_hdr() - context changes in svc_process_common() - dropped trace_svc_process() changes Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13ext4: force inode writes when nfsd calls commit_metadata()Theodore Ts'o
commit fde872682e175743e0c3ef939c89e3c6008a1529 upstream. Some time back, nfsd switched from calling vfs_fsync() to using a new commit_metadata() hook in export_operations(). If the file system did not provide a commit_metadata() hook, it fell back to using sync_inode_metadata(). Unfortunately doesn't work on all file systems. In particular, it doesn't work on ext4 due to how the inode gets journalled --- the VFS writeback code will not always call ext4_write_inode(). So we need to provide our own ext4_nfs_commit_metdata() method which calls ext4_write_inode() directly. Google-Bug-Id: 121195940 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-13sock: Make sock->sk_stamp thread-safeDeepa Dinamani
[ Upstream commit 3a0ed3e9619738067214871e9cb826fa23b2ddb9 ] Al Viro mentioned (Message-ID <20170626041334.GZ10672@ZenIV.linux.org.uk>) that there is probably a race condition lurking in accesses of sk_stamp on 32-bit machines. sock->sk_stamp is of type ktime_t which is always an s64. On a 32 bit architecture, we might run into situations of unsafe access as the access to the field becomes non atomic. Use seqlocks for synchronization. This allows us to avoid using spinlocks for readers as readers do not need mutual exclusion. Another approach to solve this is to require sk_lock for all modifications of the timestamps. The current approach allows for timestamps to have their own lock: sk_stamp_lock. This allows for the patch to not compete with already existing critical sections, and side effects are limited to the paths in the patch. The addition of the new field maintains the data locality optimizations from commit 9115e8cd2a0c ("net: reorganize struct sock for better data locality") Note that all the instances of the sk_stamp accesses are either through the ioctl or the syscall recvmsg. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-21posix-timers: Sanitize overrun handlingThomas Gleixner
commit 78c9c4dfbf8c04883941445a195276bb4bb92c76 upstream. The posix timer overrun handling is broken because the forwarding functions can return a huge number of overruns which does not fit in an int. As a consequence timer_getoverrun(2) and siginfo::si_overrun can turn into random number generators. The k_clock::timer_forward() callbacks return a 64 bit value now. Make k_itimer::ti_overrun[_last] 64bit as well, so the kernel internal accounting is correct. 3Remove the temporary (int) casts. Add a helper function which clamps the overrun value returned to user space via timer_getoverrun(2) or siginfo::si_overrun limited to a positive value between 0 and INT_MAX. INT_MAX is an indicator for user space that the overrun value has been clamped. Reported-by: Team OWL337 <icytxw@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Link: https://lkml.kernel.org/r/20180626132705.018623573@linutronix.de [florian: Make patch apply to v4.9.135] Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-21ALSA: pcm: remove SNDRV_PCM_IOCTL1_INFO internal commandTakashi Sakamoto
commit e11f0f90a626f93899687b1cc909ee37dd6c5809 upstream. Drivers can implement 'struct snd_pcm_ops.ioctl' to handle some requests from ALSA PCM core. These requests are internal purpose in kernel land. Usually common set of operations are used for it. SNDRV_PCM_IOCTL1_INFO is one of the requests. According to code comment, it has been obsoleted in the old days. We can see old releases in ftp.alsa-project.org. The command was firstly introduced in v0.5.0 release as SND_PCM_IOCTL1_INFO, to allow drivers to fill data of 'struct snd_pcm_channel_info' type. In v0.9.0 release, this was obsoleted by the other commands for ioctl(2) such as SNDRV_PCM_IOCTL_CHANNEL_INFO. This commit removes the long-abandoned command, bye. Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17exec: avoid gcc-8 warning for get_task_commArnd Bergmann
commit 3756f6401c302617c5e091081ca4d26ab604bec5 upstream. gcc-8 warns about using strncpy() with the source size as the limit: fs/exec.c:1223:32: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess] This is indeed slightly suspicious, as it protects us from source arguments without NUL-termination, but does not guarantee that the destination is terminated. This keeps the strncpy() to ensure we have properly padded target buffer, but ensures that we use the correct length, by passing the actual length of the destination buffer as well as adding a build-time check to ensure it is exactly TASK_COMM_LEN. There are only 23 callsites which I all reviewed to ensure this is currently the case. We could get away with doing only the check or passing the right length, but it doesn't hurt to do both. Link: http://lkml.kernel.org/r/20171205151724.1764896-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Kees Cook <keescook@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Aleksa Sarai <asarai@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-17neighbour: Avoid writing before skb->head in neigh_hh_output()Stefano Brivio
[ Upstream commit e6ac64d4c4d095085d7dd71cbd05704ac99829b2 ] While skb_push() makes the kernel panic if the skb headroom is less than the unaligned hardware header size, it will proceed normally in case we copy more than that because of alignment, and we'll silently corrupt adjacent slabs. In the case fixed by the previous patch, "ipv6: Check available headroom in ip6_xmit() even without options", we end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware header and write 16 bytes, starting 2 bytes before the allocated buffer. Always check we're not writing before skb->head and, if the headroom is not enough, warn and drop the packet. v2: - instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet (Eric Dumazet) - if we avoid the panic, though, we need to explicitly check the headroom before the memcpy(), otherwise we'll have corrupted slabs on a running kernel, after we warn - use __skb_push() instead of skb_push(), as the headroom check is already implemented here explicitly (Eric Dumazet) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-13ALSA: pcm: Fix interval evaluation with openmin/maxTakashi Iwai
commit 5363857b916c1f48027e9b96ee8be8376bf20811 upstream. As addressed in alsa-lib (commit b420056604f0), we need to fix the case where the evaluation of PCM interval "(x x+1]" leading to -EINVAL. After applying rules, such an interval may be translated as "(x x+1)". Fixes: ff2d6acdf6f1 ("ALSA: pcm: Fix snd_interval_refine first/last with open min/max") Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-13USB: check usb_get_extra_descriptor for proper sizeMathias Payer
commit 704620afc70cf47abb9d6a1a57f3825d2bca49cf upstream. When reading an extra descriptor, we need to properly check the minimum and maximum size allowed, to prevent from invalid data being sent by a device. Reported-by: Hui Peng <benquike@gmail.com> Reported-by: Mathias Payer <mathias.payer@nebelwelt.net> Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hui Peng <benquike@gmail.com> Signed-off-by: Mathias Payer <mathias.payer@nebelwelt.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-01can: dev: can_get_echo_skb(): factor out non sending code to ↵Marc Kleine-Budde
__can_get_echo_skb() commit a4310fa2f24687888ce80fdb0e88583561a23700 upstream. This patch factors out all non sending parts of can_get_echo_skb() into a seperate function __can_get_echo_skb(), so that it can be re-used in an upcoming patch. Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-27new helper: uaccess_kernel()Al Viro
commit db68ce10c4f0a27c1ff9fa0e789e5c41f8c4ea63 upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [only take the include/linux/uaccess.h portion - gregkh] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-22libceph: bump CEPH_MSG_MAX_DATA_LENIlya Dryomov
commit 94e6992bb560be8bffb47f287194adf070b57695 upstream. If the read is large enough, we end up spinning in the messenger: libceph: osd0 192.168.122.1:6801 io error libceph: osd0 192.168.122.1:6801 io error libceph: osd0 192.168.122.1:6801 io error This is a receive side limit, so only reads were affected. Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-22TC: Set DMA masks for devicesMaciej W. Rozycki
commit 3f2aa244ee1a0d17ed5b6c86564d2c1b24d1c96b upstream. Fix a TURBOchannel support regression with commit 205e1b7f51e4 ("dma-mapping: warn when there is no coherent_dma_mask") that caused coherent DMA allocations to produce a warning such as: defxx: v1.11 2014/07/01 Lawrence V. Stefani and others tc1: DEFTA at MMIO addr = 0x1e900000, IRQ = 20, Hardware addr = 08-00-2b-a3-a3-29 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 dfx_dev_register+0x670/0x678 Modules linked in: CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.0-rc6 #2 Stack : ffffffff8009ffc0 fffffffffffffec0 0000000000000000 ffffffff80647650 0000000000000000 0000000000000000 ffffffff806f5f80 ffffffffffffffff 0000000000000000 0000000000000000 0000000000000001 ffffffff8065d4e8 98000000031b6300 ffffffff80563478 ffffffff805685b0 ffffffffffffffff 0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8 0000000000000000 0000000000000009 ffffffff8053efd0 ffffffff806657d0 0000000000000000 ffffffff803177f8 0000000000000000 ffffffff806d0000 9800000003078000 980000000307b9e0 000000001e900000 ffffffff80067940 0000000000000000 ffffffff805d6720 0000000000000204 ffffffff80388df8 ffffffff805176c0 ffffffff8004dc78 0000000000000000 ffffffff80067940 ... Call Trace: [<ffffffff8004dc78>] show_stack+0xa0/0x130 [<ffffffff80067940>] __warn+0x128/0x170 ---[ end trace b1d1e094f67f3bb2 ]--- This is because the TURBOchannel bus driver fails to set the coherent DMA mask for devices enumerated. Set the regular and coherent DMA masks for TURBOchannel devices then, observing that the bus protocol supports a 34-bit (16GiB) DMA address space, by interpreting the value presented in the address cycle across the 32 `ad' lines as a 32-bit word rather than byte address[1]. The architectural size of the TURBOchannel DMA address space exceeds the maximum amount of RAM any actual TURBOchannel system in existence may have, hence both masks are the same. This removes the warning shown above. References: [1] "TURBOchannel Hardware Specification", EK-369AA-OD-007B, Digital Equipment Corporation, January 1993, Section "DMA", pp. 1-15 -- 1-17 Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Patchwork: https://patchwork.linux-mips.org/patch/20835/ Fixes: 205e1b7f51e4 ("dma-mapping: warn when there is no coherent_dma_mask") Cc: stable@vger.kernel.org # 4.16+ Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-10unix: correctly track in-flight fds in sending process user_structHannes Frederic Sowa
[ Upstream commit 415e3d3e90ce9e18727e8843ae343eda5a58fad6 ] The commit referenced in the Fixes tag incorrectly accounted the number of in-flight fds over a unix domain socket to the original opener of the file-descriptor. This allows another process to arbitrary deplete the original file-openers resource limit for the maximum of open files. Instead the sending processes and its struct cred should be credited. To do so, we add a reference counted struct user_struct pointer to the scm_fp_list and use it to account for the number of inflight unix fds. Fixes: 712f4aad406bb1 ("unix: properly account for FDs passed over unix sockets") Reported-by: David Herrmann <dh.herrmann@gmail.com> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-10net: fix warnings in 'make htmldocs' by moving macro definition out of field ↵Hannes Frederic Sowa
declaration [ Upstream commit 7bbadd2d1009575dad675afc16650ebb5aa10612 ] Docbook does not like the definition of macros inside a field declaration and adds a warning. Move the definition out. Fixes: 79462ad02e86180 ("net: add validation for the socket syscall protocol argument") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-10tty: audit: Fix audit sourcePeter Hurley
[ Upstream commit 6b2a3d628aa752f0ab825fc6d4d07b09e274d1c1 ] The data to audit/record is in the 'from' buffer (ie., the input read buffer). Fixes: 72586c6061ab ("n_tty: Fix auditing support for cannonical mode") Cc: stable <stable@vger.kernel.org> # 4.1+ Cc: Miloslav Trmač <mitr@redhat.com> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Acked-by: Laura Abbott <labbott@fedoraproject.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-10lib: make memzero_explicit more robust against dead store eliminationDaniel Borkmann
[ Upstream commit 7829fb09a2b4268b30dd9bc782fa5ebee278b137 ] In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in case LTO would decide to inline memzero_explicit() and eventually find out it could be elimiated as dead store. While using barrier() works well for the case of gcc, recent efforts from LLVMLinux people suggest to use llvm as an alternative to gcc, and there, Stephan found in a simple stand-alone user space example that llvm could nevertheless optimize and thus elimitate the memset(). A similar issue has been observed in the referenced llvm bug report, which is regarded as not-a-bug. Based on some experiments, icc is a bit special on its own, while it doesn't seem to eliminate the memset(), it could do so with an own implementation, and then result in similar findings as with llvm. The fix in this patch now works for all three compilers (also tested with more aggressive optimization levels). Arguably, in the current kernel tree it's more of a theoretical issue, but imho, it's better to be pedantic about it. It's clearly visible with gcc/llvm though, with the below code: if we would have used barrier() only here, llvm would have omitted clearing, not so with barrier_data() variant: static inline void memzero_explicit(void *s, size_t count) { memset(s, 0, count); barrier_data(s); } int main(void) { char buff[20]; memzero_explicit(buff, sizeof(buff)); return 0; } $ gcc -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x0000000000400400 <+0>: lea -0x28(%rsp),%rax 0x0000000000400405 <+5>: movq $0x0,-0x28(%rsp) 0x000000000040040e <+14>: movq $0x0,-0x20(%rsp) 0x0000000000400417 <+23>: movl $0x0,-0x18(%rsp) 0x000000000040041f <+31>: xor %eax,%eax 0x0000000000400421 <+33>: retq End of assembler dump. $ clang -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x00000000004004f0 <+0>: xorps %xmm0,%xmm0 0x00000000004004f3 <+3>: movaps %xmm0,-0x18(%rsp) 0x00000000004004f8 <+8>: movl $0x0,-0x8(%rsp) 0x0000000000400500 <+16>: lea -0x18(%rsp),%rax 0x0000000000400505 <+21>: xor %eax,%eax 0x0000000000400507 <+23>: retq End of assembler dump. As gcc, clang, but also icc defines __GNUC__, it's sufficient to define this in compiler-gcc.h only to be picked up. For a fallback or otherwise unsupported compiler, we define it as a barrier. Similarly, for ecc which does not support gcc inline asm. Reference: https://llvm.org/bugs/show_bug.cgi?id=15495 Reported-by: Stephan Mueller <smueller@chronox.de> Tested-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Stephan Mueller <smueller@chronox.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: mancha security <mancha1@zoho.com> Cc: Mark Charlebois <charlebm@gmail.com> Cc: Behan Webster <behanw@converseincode.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-10NFSv4: Cache the NFSv4/v4.1 client owner_id in the struct nfs_clientTrond Myklebust
[ Upstream commit ceb3a16c070c403f5f9ca46b46cf2bb79ea11750 ] Ensure that we cache the NFSv4/v4.1 client owner_id so that we can verify it when we're doing trunking detection. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-10phy: phy-ti-pipe3: fix inconsistent enumeration of PCIe gen2 cardsVignesh R
[ Upstream commit 0bc09f9cdc589e0b54724096138996a00b19babb ] Prior to DRA74x silicon rev 1.1, pcie_pcs register bits 8-15 and bits 16-23 were used to configure RC delay count for phy1 and phy2 respectively. phyid was used as index to distinguish the phys and to configure the delay values appropriately. As of DRA74x silicon rev 1.1, pcie_pcs register definition has changed. Bits 16-23 are used to configure delay values for *both* phy1 and phy2. Hence phyid is no longer required. So, drop id field from ti_pipe3 structure and its subsequent references for configuring pcie_pcs register. Also, pcie_pcs register now needs to be configured with delay value of 0x96 at bit positions 16-23. See register description of CTRL_CORE_PCIE_PCS in ARM572x TRM, SPRUHZ6, October 2014, section 18.5.2.2, table 18-1804. This is needed to ensure Gen2 cards are enumerated consistently. DRA72x silicon behaves same way as DRA74x rev 1.1 as far as this functionality is considered. Test results on DRA74x and DRA72x EVMs: Before patch ------------ DRA74x ES 1.0: Gen1 cards work, Gen2 cards do not work (expected result due to silicon errata) DRA74x ES 1.1: Gen1 cards work, Gen2 cards do not work sometimes due to incorrect programming of register DRA72x: Gen1 cards work, Gen2 cards do not work sometimes due to incorrect programming of register After patch ----------- DRA74x ES 1.0: Gen1 cards work, Gen2 cards do not work (expected result due to silicon errata) DRA74x ES 1.1: Gen1 cards work, Gen2 cards work consistently. DRA72x: Gen1 and Gen2 cards enumerate consistently. Signed-off-by: Vignesh R <vigneshr@ti.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-10-13ebtables: arpreply: Add the standard target sanity checkGao Feng
commit c953d63548207a085abcb12a15fefc8a11ffdf0a upstream. The info->target comes from userspace and it would be used directly. So we need to add the sanity check to make sure it is a valid standard target, although the ebtables tool has already checked it. Kernel needs to validate anything coming from userspace. If the target is set as an evil value, it would break the ebtables and cause a panic. Because the non-standard target is treated as one offset. Now add one helper function ebt_invalid_target, and we would replace the macro INVALID_TARGET later. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Loic <hackurx@opensec.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-13Make file credentials available to the seqfile interfacesLinus Torvalds
commit 34dbbcdbf63360661ff7bda6c5f52f99ac515f92 upstream. A lot of seqfile users seem to be using things like %pK that uses the credentials of the current process, but that is actually completely wrong for filesystem interfaces. The unix semantics for permission checking files is to check permissions at _open_ time, not at read or write time, and that is not just a small detail: passing off stdin/stdout/stderr to a suid application and making the actual IO happen in privileged context is a classic exploit technique. So if we want to be able to look at permissions at read time, we need to use the file open credentials, not the current ones. Normal file accesses can just use "f_cred" (or any of the helper functions that do that, like file_ns_capable()), but the seqfile interfaces do not have any such options. It turns out that seq_file _does_ save away the user_ns information of the file, though. Since user_ns is just part of the full credential information, replace that special case with saving off the cred pointer instead, and suddenly seq_file has all the permission information it needs. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jann Horn <jannh@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-13media: v4l: event: Prevent freeing event subscriptions while accessedSakari Ailus
commit ad608fbcf166fec809e402d548761768f602702c upstream. The event subscriptions are added to the subscribed event list while holding a spinlock, but that lock is subsequently released while still accessing the subscription object. This makes it possible to unsubscribe the event --- and freeing the subscription object's memory --- while the subscription object is simultaneously accessed. Prevent this by adding a mutex to serialise the event subscription and unsubscription. This also gives a guarantee to the callback ops that the add op has returned before the del op is called. This change also results in making the elems field less special: subscriptions are only added to the event list once they are fully initialised. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Hans Verkuil <hans.verkuil@cisco.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: stable@vger.kernel.org # for 4.14 and up Fixes: c3b5b0241f62 ("V4L/DVB: V4L: Events: Add backend") Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-13slub: make ->cpu_partial unsigned intAlexey Dobriyan
commit e5d9998f3e09359b372a037a6ac55ba235d95d57 upstream. /* * cpu_partial determined the maximum number of objects * kept in the per cpu partial lists of a processor. */ Can't be negative. Link: http://lkml.kernel.org/r/20180305200730.15812-15-adobriyan@gmail.com Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-26mm: get rid of vmacache_flush_all() entirelyLinus Torvalds
commit 7a9cdebdcc17e426fb5287e4a82db1dfe86339b2 upstream. Jann Horn points out that the vmacache_flush_all() function is not only potentially expensive, it's buggy too. It also happens to be entirely unnecessary, because the sequence number overflow case can be avoided by simply making the sequence number be 64-bit. That doesn't even grow the data structures in question, because the other adjacent fields are already 64-bit. So simplify the whole thing by just making the sequence number overflow case go away entirely, which gets rid of all the complications and makes the code faster too. Win-win. [ Oleg Nesterov points out that the VMACACHE_FULL_FLUSHES statistics also just goes away entirely with this ] Reported-by: Jann Horn <jannh@google.com> Suggested-by: Will Deacon <will.deacon@arm.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-09udlfb: set optimal write delayMikulas Patocka
commit bb24153a3f13dd0dbc1f8055ad97fe346d598f66 upstream. The default delay 5 jiffies is too much when the kernel is compiled with HZ=100 - it results in jumpy cursor in Xwindow. In order to find out the optimal delay, I benchmarked the driver on 1280x720x30fps video. I found out that with HZ=1000, 10ms is acceptable, but with HZ=250 or HZ=300, we need 4ms, so that the video is played without any frame skips. This patch changes the delay to this value. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-09-05scsi: sysfs: Introduce sysfs_{un,}break_active_protection()Bart Van Assche
commit 2afc9166f79b8f6da5f347f48515215ceee4ae37 upstream. Introduce these two functions and export them such that the next patch can add calls to these functions from the SCSI core. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-28netfilter: ipv6: nf_defrag: reduce struct net memory wasteEric Dumazet
[ Upstream commit 9ce7bc036ae4cfe3393232c86e9e1fea2153c237 ] It is a waste of memory to use a full "struct netns_sysctl_ipv6" while only one pointer is really used, considering netns_sysctl_ipv6 keeps growing. Also, since "struct netns_frags" has cache line alignment, it is better to move the frags_hdr pointer outside, otherwise we spend a full cache line for this pointer. This saves 192 bytes of memory per netns. Fixes: c038a767cd69 ("ipv6: add a new namespace for nf_conntrack_reasm") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-28vsock: split dwork to avoid reinitializationsCong Wang
[ Upstream commit 455f05ecd2b219e9a216050796d30c830d9bc393 ] syzbot reported that we reinitialize an active delayed work in vsock_stream_connect(): ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414 WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329 debug_print_object+0x16a/0x210 lib/debugobjects.c:326 The pattern is apparently wrong, we should only initialize the dealyed work once and could repeatly schedule it. So we have to move out the initializations to allocation side. And to avoid confusion, we can split the shared dwork into two, instead of re-using the same one. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com> Cc: Andy king <acking@vmware.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-28llc: use refcount_inc_not_zero() for llc_sap_find()Cong Wang
[ Upstream commit 0dcb82254d65f72333aa50ad626d1e9665ad093b ] llc_sap_put() decreases the refcnt before deleting sap from the global list. Therefore, there is a chance llc_sap_find() could find a sap with zero refcnt in this global list. Close this race condition by checking if refcnt is zero or not in llc_sap_find(), if it is zero then it is being removed so we can just treat it as gone. Reported-by: <syzbot+278893f3f7803871f7ce@syzkaller.appspotmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-17crypto: vmac - separate tfm and request contextEric Biggers
commit bb29648102335586e9a66289a1d98a0cb392b6e5 upstream. syzbot reported a crash in vmac_final() when multiple threads concurrently use the same "vmac(aes)" transform through AF_ALG. The bug is pretty fundamental: the VMAC template doesn't separate per-request state from per-tfm (per-key) state like the other hash algorithms do, but rather stores it all in the tfm context. That's wrong. Also, vmac_final() incorrectly zeroes most of the state including the derived keys and cached pseudorandom pad. Therefore, only the first VMAC invocation with a given key calculates the correct digest. Fix these bugs by splitting the per-tfm state from the per-request state and using the proper init/update/final sequencing for requests. Reproducer for the crash: #include <linux/if_alg.h> #include <sys/socket.h> #include <unistd.h> int main() { int fd; struct sockaddr_alg addr = { .salg_type = "hash", .salg_name = "vmac(aes)", }; char buf[256] = { 0 }; fd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(fd, (void *)&addr, sizeof(addr)); setsockopt(fd, SOL_ALG, ALG_SET_KEY, buf, 16); fork(); fd = accept(fd, NULL, NULL); for (;;) write(fd, buf, 256); } The immediate cause of the crash is that vmac_ctx_t.partial_size exceeds VMAC_NHBYTES, causing vmac_final() to memset() a negative length. Reported-by: syzbot+264bca3a6e8d645550d3@syzkaller.appspotmail.com Fixes: f1939f7c5645 ("crypto: vmac - New hash algorithm for intel_txt support") Cc: <stable@vger.kernel.org> # v2.6.32+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-09ring_buffer: tracing: Inherit the tracing setting to next ring bufferMasami Hiramatsu
commit 73c8d8945505acdcbae137c2e00a1232e0be709f upstream. Maintain the tracing on/off setting of the ring_buffer when switching to the trace buffer snapshot. Taking a snapshot is done by swapping the backup ring buffer (max_tr_buffer). But since the tracing on/off setting is defined by the ring buffer, when swapping it, the tracing on/off setting can also be changed. This causes a strange result like below: /sys/kernel/debug/tracing # cat tracing_on 1 /sys/kernel/debug/tracing # echo 0 > tracing_on /sys/kernel/debug/tracing # cat tracing_on 0 /sys/kernel/debug/tracing # echo 1 > snapshot /sys/kernel/debug/tracing # cat tracing_on 1 /sys/kernel/debug/tracing # echo 1 > snapshot /sys/kernel/debug/tracing # cat tracing_on 0 We don't touch tracing_on, but snapshot changes tracing_on setting each time. This is an anomaly, because user doesn't know that each "ring_buffer" stores its own tracing-enable state and the snapshot is done by swapping ring buffers. Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox Cc: Ingo Molnar <mingo@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp> Cc: stable@vger.kernel.org Fixes: debdd57f5145 ("tracing: Make a snapshot feature available from userspace") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> [ Updated commit log and comment in the code ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-09tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_modeEric Dumazet
[ Upstream commit 9a9c9b51e54618861420093ae6e9b50a961914c5 ] We want to add finer control of the number of ACK packets sent after ECN events. This patch is not changing current behavior, it only enables following change. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-28tcp: do not delay ACK in DCTCP upon CE status changeYuchung Cheng
[ Upstream commit a0496ef2c23b3b180902dd185d0d63ccbc624cf8 ] Per DCTCP RFC8257 (Section 3.2) the ACK reflecting the CE status change has to be sent immediately so the sender can respond quickly: """ When receiving packets, the CE codepoint MUST be processed as follows: 1. If the CE codepoint is set and DCTCP.CE is false, set DCTCP.CE to true and send an immediate ACK. 2. If the CE codepoint is not set and DCTCP.CE is true, set DCTCP.CE to false and send an immediate ACK. """ Previously DCTCP implementation may continue to delay the ACK. This patch fixes that to implement the RFC by forcing an immediate ACK. Tested with this packetdrill script provided by Larry Brakmo 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> 0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> 0.110 < [ect0] . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0 0.200 < [ect0] . 1:1001(1000) ack 1 win 257 0.200 > [ect01] . 1:1(0) ack 1001 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 1:2(1) ack 1001 0.200 < [ect0] . 1001:2001(1000) ack 2 win 257 +0.005 < [ce] . 2001:3001(1000) ack 2 win 257 +0.000 > [ect01] . 2:2(0) ack 2001 // Previously the ACK below would be delayed by 40ms +0.000 > [ect01] E. 2:2(0) ack 3001 +0.500 < F. 9501:9501(0) ack 4 win 257 Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-28tcp: do not cancel delay-AcK on DCTCP special ACKYuchung Cheng
[ Upstream commit 27cde44a259c380a3c09066fc4b42de7dde9b1ad ] Currently when a DCTCP receiver delays an ACK and receive a data packet with a different CE mark from the previous one's, it sends two immediate ACKs acking previous and latest sequences respectly (for ECN accounting). Previously sending the first ACK may mark off the delayed ACK timer (tcp_event_ack_sent). This may subsequently prevent sending the second ACK to acknowledge the latest sequence (tcp_ack_snd_check). The culprit is that tcp_send_ack() assumes it always acknowleges the latest sequence, which is not true for the first special ACK. The fix is to not make the assumption in tcp_send_ack and check the actual ack sequence before cancelling the delayed ACK. Further it's safer to pass the ack sequence number as a local variable into tcp_send_ack routine, instead of intercepting tp->rcv_nxt to avoid future bugs like this. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-28net: Don't copy pfmemalloc flag in __copy_skb_header()Stefano Brivio
[ Upstream commit 8b7008620b8452728cadead460a36f64ed78c460 ] The pfmemalloc flag indicates that the skb was allocated from the PFMEMALLOC reserves, and the flag is currently copied on skb copy and clone. However, an skb copied from an skb flagged with pfmemalloc wasn't necessarily allocated from PFMEMALLOC reserves, and on the other hand an skb allocated that way might be copied from an skb that wasn't. So we should not copy the flag on skb copy, and rather decide whether to allow an skb to be associated with sockets unrelated to page reclaim depending only on how it was allocated. Move the pfmemalloc flag before headers_start[0] using an existing 1-bit hole, so that __copy_skb_header() doesn't copy it. When cloning, we'll now take care of this flag explicitly, contravening to the warning comment of __skb_clone(). While at it, restore the newline usage introduced by commit b19372273164 ("net: reorganize sk_buff for faster __copy_skb_header()") to visually separate bytes used in bitfields after headers_start[0], that was gone after commit a9e419dc7be6 ("netfilter: merge ctinfo into nfct pointer storage area"), and describe the pfmemalloc flag in the kernel-doc structure comment. This doesn't change the size of sk_buff or cacheline boundaries, but consolidates the 15 bits hole before tc_index into a 2 bytes hole before csum, that could now be filled more easily. Reported-by: Patrick Talbert <ptalbert@redhat.com> Fixes: c93bdd0e03e8 ("netvm: allow skb allocation to use PFMEMALLOC reserves") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03block: Fix transfer when chunk sectors exceeds maxKeith Busch
commit 15bfd21fbc5d35834b9ea383dc458a1f0c9e3434 upstream. A device may have boundary restrictions where the number of sectors between boundaries exceeds its max transfer size. In this case, we need to cap the max size to the smaller of the two limits. Reported-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Tested-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com> Cc: <stable@vger.kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03branch-check: fix long->int truncation when profiling branchesMikulas Patocka
commit 2026d35741f2c3ece73c11eb7e4a15d7c2df9ebe upstream. The function __builtin_expect returns long type (see the gcc documentation), and so do macros likely and unlikely. Unfortunatelly, when CONFIG_PROFILE_ANNOTATED_BRANCHES is selected, the macros likely and unlikely expand to __branch_check__ and __branch_check__ truncates the long type to int. This unintended truncation may cause bugs in various kernel code (we found a bug in dm-writecache because of it), so it's better to fix __branch_check__ to return long. Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1805300818140.24812@file01.intranet.prod.int.rdu2.redhat.com Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: 1f0d69a9fc815 ("tracing: profile likely and unlikely annotations") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-13tcp: avoid integer overflows in tcp_rcv_space_adjust()Eric Dumazet
commit 607065bad9931e72207b0cac365d7d4abc06bd99 upstream. When using large tcp_rmem[2] values (I did tests with 500 MB), I noticed overflows while computing rcvwin. Lets fix this before the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Wei Wang <weiwan@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [Backport: sysctl_tcp_rmem is not Namespace-ify'd in older kernels] Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-13cfg80211: further limit wiphy names to 64 bytesEric Biggers
commit 814596495dd2b9d4aab92d8f89cf19060d25d2ea upstream. wiphy names were recently limited to 128 bytes by commit a7cfebcb7594 ("cfg80211: limit wiphy names to 128 bytes"). As it turns out though, this isn't sufficient because dev_vprintk_emit() needs the syslog header string "SUBSYSTEM=ieee80211\0DEVICE=+ieee80211:$devname" to fit into 128 bytes. This triggered the "device/subsystem name too long" WARN when the device name was >= 90 bytes. As before, this was reproduced by syzbot by sending an HWSIM_CMD_NEW_RADIO command to the MAC80211_HWSIM generic netlink family. Fix it by further limiting wiphy names to 64 bytes. Reported-by: syzbot+e64565577af34b3768dc@syzkaller.appspotmail.com Fixes: a7cfebcb7594 ("cfg80211: limit wiphy names to 128 bytes") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30usb: gadget: composite: fix incorrect handling of OS desc requestsChris Dickens
[ Upstream commit 5d6ae4f0da8a64a185074dabb1b2f8c148efa741 ] When handling an OS descriptor request, one of the first operations is to zero out the request buffer using the wLength from the setup packet. There is no bounds checking, so a wLength > 4096 would clobber memory adjacent to the request buffer. Fix this by taking the min of wLength and the request buffer length prior to the memset. While at it, define the buffer length in a header file so that magic numbers don't appear throughout the code. When returning data to the host, the data length should be the min of the wLength and the valid data we have to return. Currently we are returning wLength, thus requests for a wLength greater than the amount of data in the OS descriptor buffer would return invalid (albeit zero'd) data following the valid descriptor data. Fix this by counting the number of bytes when constructing the data and using this when determining the length of the request. Signed-off-by: Chris Dickens <christopher.a.dickens@gmail.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30llc: properly handle dev_queue_xmit() return valueCong Wang
[ Upstream commit b85ab56c3f81c5a24b5a5213374f549df06430da ] llc_conn_send_pdu() pushes the skb into write queue and calls llc_conn_send_pdus() to flush them out. However, the status of dev_queue_xmit() is not returned to caller, in this case, llc_conn_state_process(). llc_conn_state_process() needs hold the skb no matter success or failure, because it still uses it after that, therefore we should hold skb before dev_queue_xmit() when that skb is the one being processed by llc_conn_state_process(). For other callers, they can just pass NULL and ignore the return value as they are. Reported-by: Noam Rathaus <noamr@beyondsecurity.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30net: Fix vlan untag for bridge and vlan_dev with reorder_hdr offToshiaki Makita
[ Upstream commit 4bbb3e0e8239f9079bf1fe20b3c0cb598714ae61 ] When we have a bridge with vlan_filtering on and a vlan device on top of it, packets would be corrupted in skb_vlan_untag() called from br_dev_xmit(). The problem sits in skb_reorder_vlan_header() used in skb_vlan_untag(), which makes use of skb->mac_len. In this function mac_len is meant for handling rx path with vlan devices with reorder_header disabled, but in tx path mac_len is typically 0 and cannot be used, which is the problem in this case. The current code even does not properly handle rx path (skb_vlan_untag() called from __netif_receive_skb_core()) with reorder_header off actually. In rx path single tag case, it works as follows: - Before skb_reorder_vlan_header() mac_header data v v +-------------------+-------------+------+---- | ETH | VLAN | ETH | | ADDRS | TPID | TCI | TYPE | +-------------------+-------------+------+---- <-------- mac_len ---------> <-------------> to be removed - After skb_reorder_vlan_header() mac_header data v v +-------------------+------+---- | ETH | ETH | | ADDRS | TYPE | +-------------------+------+---- <-------- mac_len ---------> This is ok, but in rx double tag case, it corrupts packets: - Before skb_reorder_vlan_header() mac_header data v v +-------------------+-------------+-------------+------+---- | ETH | VLAN | VLAN | ETH | | ADDRS | TPID | TCI | TPID | TCI | TYPE | +-------------------+-------------+-------------+------+---- <--------------- mac_len ----------------> <-------------> should be removed <---------------------------> actually will be removed - After skb_reorder_vlan_header() mac_header data v v +-------------------+------+---- | ETH | ETH | | ADDRS | TYPE | +-------------------+------+---- <--------------- mac_len ----------------> So, two of vlan tags are both removed while only inner one should be removed and mac_header (and mac_len) is broken. skb_vlan_untag() is meant for removing the vlan header at (skb->data - 2), so use skb->data and skb->mac_header to calculate the right offset. Reported-by: Brandon Carpenter <brandon.carpenter@cypherpath.com> Fixes: a6e18ff11170 ("vlan: Fix untag operations of stacked vlans with REORDER_HEADER off") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30regulatory: add NUL to request alpha2Johannes Berg
[ Upstream commit 657308f73e674e86b60509a430a46e569bf02846 ] Similar to the ancient commit a5fe8e7695dc ("regulatory: add NUL to alpha2"), add another byte to alpha2 in the request struct so that when we use nla_put_string(), we don't overrun anything. Fixes: 73d54c9e74c4 ("cfg80211: add regulatory netlink multicast group") Reported-by: Kees Cook <keescook@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4Felix Fietkau
[ Upstream commit 651b9920d7a694ffb1f885aef2bbb068a25d9d66 ] This ensures that mac80211 allocated management frames are properly aligned, which makes copying them more efficient. For instance, mt76 uses iowrite32_copy to copy beacon frames to beacon template memory on the chip. Misaligned 32-bit accesses cause CPU exceptions on MIPS and should be avoided. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30x86/power: Fix swsusp_arch_resume prototypeArnd Bergmann
[ Upstream commit 328008a72d38b5bde6491e463405c34a81a65d3e ] The declaration for swsusp_arch_resume marks it as 'asmlinkage', but the definition in x86-32 does not, and it fails to include the header with the declaration. This leads to a warning when building with link-time-optimizations: kernel/power/power.h:108:23: error: type of 'swsusp_arch_resume' does not match original declaration [-Werror=lto-type-mismatch] extern asmlinkage int swsusp_arch_resume(void); ^ arch/x86/power/hibernate_32.c:148:0: note: 'swsusp_arch_resume' was previously declared here int swsusp_arch_resume(void) This moves the declaration into a globally visible header file and fixes up both x86 definitions to match it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Len Brown <len.brown@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Nicolas Pitre <nico@linaro.org> Cc: linux-pm@vger.kernel.org Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: Bart Van Assche <bart.vanassche@wdc.com> Link: https://lkml.kernel.org/r/20180202145634.200291-2-arnd@arndb.de Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30asm-generic: provide generic_pmdp_establish()Kirill A. Shutemov
[ Upstream commit c58f0bb77ed8bf93dfdde762b01cb67eebbdfc29 ] Patch series "Do not lose dirty bit on THP pages", v4. Vlastimil noted that pmdp_invalidate() is not atomic and we can lose dirty and access bits if CPU sets them after pmdp dereference, but before set_pmd_at(). The bug can lead to data loss, but the race window is tiny and I haven't seen any reports that suggested that it happens in reality. So I don't think it worth sending it to stable. Unfortunately, there's no way to address the issue in a generic way. We need to fix all architectures that support THP one-by-one. All architectures that have THP supported have to provide atomic pmdp_invalidate() that returns previous value. If generic implementation of pmdp_invalidate() is used, architecture needs to provide atomic pmdp_estabish(). pmdp_estabish() is not used out-side generic implementation of pmdp_invalidate() so far, but I think this can change in the future. This patch (of 12): This is an implementation of pmdp_establish() that is only suitable for an architecture that doesn't have hardware dirty/accessed bits. In this case we can't race with CPU which sets these bits and non-atomic approach is fine. Link: http://lkml.kernel.org/r/20171213105756.69879-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Daney <david.daney@cavium.com> Cc: David Miller <davem@davemloft.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nitin Gupta <nitin.m.gupta@oracle.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into ↵Anna-Maria Gleixner
account [ Upstream commit 91633eed73a3ac37aaece5c8c1f93a18bae616a9 ] So far only CLOCK_MONOTONIC and CLOCK_REALTIME were taken into account as well as HRTIMER_MODE_ABS/REL in the hrtimer_init tracepoint. The query for detecting the ABS or REL timer modes is not valid anymore, it got broken by the introduction of HRTIMER_MODE_PINNED. HRTIMER_MODE_PINNED is not evaluated in the hrtimer_init() call, but for the sake of completeness print all given modes. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: keescook@chromium.org Link: http://lkml.kernel.org/r/20171221104205.7269-9-anna-maria@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30do d_instantiate/unlock_new_inode combinations safelyAl Viro
commit 1e2e547a93a00ebc21582c06ca3c6cfea2a309ee upstream. For anything NFS-exported we do _not_ want to unlock new inode before it has grown an alias; original set of fixes got the ordering right, but missed the nasty complication in case of lockdep being enabled - unlock_new_inode() does lockdep_annotate_inode_mutex_key(inode) which can only be done before anyone gets a chance to touch ->i_mutex. Unfortunately, flipping the order and doing unlock_new_inode() before d_instantiate() opens a window when mkdir can race with open-by-fhandle on a guessed fhandle, leading to multiple aliases for a directory inode and all the breakage that follows from that. Correct solution: a new primitive (d_instantiate_new()) combining these two in the right order - lockdep annotate, then d_instantiate(), then the rest of unlock_new_inode(). All combinations of d_instantiate() with unlock_new_inode() should be converted to that. Cc: stable@kernel.org # 2.6.29 and later Tested-by: Mike Marshall <hubcap@omnibond.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25cfg80211: limit wiphy names to 128 bytesJohannes Berg
commit a7cfebcb7594a24609268f91299ab85ba064bf82 upstream. There's currently no limit on wiphy names, other than netlink message size and memory limitations, but that causes issues when, for example, the wiphy name is used in a uevent, e.g. in rfkill where we use the same name for the rfkill instance, and then the buffer there is "only" 2k for the environment variables. This was reported by syzkaller, which used a 4k name. Limit the name to something reasonable, I randomly picked 128. Reported-by: syzbot+230d9e642a85d3fec29c@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-25efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' ↵Ard Biesheuvel
definition for mixed mode commit 0b3225ab9407f557a8e20f23f37aa7236c10a9b1 upstream. Mixed mode allows a kernel built for x86_64 to interact with 32-bit EFI firmware, but requires us to define all struct definitions carefully when it comes to pointer sizes. 'struct efi_pci_io_protocol_32' currently uses a 'void *' for the 'romimage' field, which will be interpreted as a 64-bit field on such kernels, potentially resulting in bogus memory references and subsequent crashes. Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20180504060003.19618-13-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>