summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-12-17Linux 4.14.7v4.14.7Greg Kroah-Hartman
2017-12-17dvb_frontend: don't use-after-free the frontend structMauro Carvalho Chehab
commit b1cb7372fa822af6c06c8045963571d13ad6348b upstream. dvb_frontend_invoke_release() may free the frontend struct. So, the free logic can't update it anymore after calling it. That's OK, as __dvb_frontend_free() is called only when the krefs are zeroed, so nobody is using it anymore. That should fix the following KASAN error: The KASAN report looks like this (running on kernel 3e0cc09a3a2c40ec1ffb6b4e12da86e98feccb11 (4.14-rc5+)): ================================================================== BUG: KASAN: use-after-free in __dvb_frontend_free+0x113/0x120 Write of size 8 at addr ffff880067d45a00 by task kworker/0:1/24 CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc5-43687-g06ab8a23e0e6 #545 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: usb_hub_wq hub_event Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x292/0x395 lib/dump_stack.c:52 print_address_description+0x78/0x280 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 kasan_report+0x23d/0x350 mm/kasan/report.c:409 __asan_report_store8_noabort+0x1c/0x20 mm/kasan/report.c:435 __dvb_frontend_free+0x113/0x120 drivers/media/dvb-core/dvb_frontend.c:156 dvb_frontend_put+0x59/0x70 drivers/media/dvb-core/dvb_frontend.c:176 dvb_frontend_detach+0x120/0x150 drivers/media/dvb-core/dvb_frontend.c:2803 dvb_usb_adapter_frontend_exit+0xd6/0x160 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:340 dvb_usb_adapter_exit drivers/media/usb/dvb-usb/dvb-usb-init.c:116 dvb_usb_exit+0x9b/0x200 drivers/media/usb/dvb-usb/dvb-usb-init.c:132 dvb_usb_device_exit+0xa5/0xf0 drivers/media/usb/dvb-usb/dvb-usb-init.c:295 usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423 __device_release_driver drivers/base/dd.c:861 device_release_driver_internal+0x4f1/0x5c0 drivers/base/dd.c:893 device_release_driver+0x1e/0x30 drivers/base/dd.c:918 bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565 device_del+0x5c4/0xab0 drivers/base/core.c:1985 usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170 usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124 hub_port_connect drivers/usb/core/hub.c:4754 hub_port_connect_change drivers/usb/core/hub.c:5009 port_event drivers/usb/core/hub.c:5115 hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195 process_one_work+0xc73/0x1d90 kernel/workqueue.c:2119 worker_thread+0x221/0x1850 kernel/workqueue.c:2253 kthread+0x363/0x440 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Allocated by task 24: save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kmem_cache_alloc_trace+0x11e/0x2d0 mm/slub.c:2772 kmalloc ./include/linux/slab.h:493 kzalloc ./include/linux/slab.h:666 dtt200u_fe_attach+0x4c/0x110 drivers/media/usb/dvb-usb/dtt200u-fe.c:212 dtt200u_frontend_attach+0x35/0x80 drivers/media/usb/dvb-usb/dtt200u.c:136 dvb_usb_adapter_frontend_init+0x32b/0x660 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:286 dvb_usb_adapter_init drivers/media/usb/dvb-usb/dvb-usb-init.c:86 dvb_usb_init drivers/media/usb/dvb-usb/dvb-usb-init.c:162 dvb_usb_device_init+0xf73/0x17f0 drivers/media/usb/dvb-usb/dvb-usb-init.c:277 dtt200u_usb_probe+0xa1/0xe0 drivers/media/usb/dvb-usb/dtt200u.c:155 usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361 really_probe drivers/base/dd.c:413 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463 __device_attach+0x26b/0x3c0 drivers/base/dd.c:710 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523 device_add+0xd0b/0x1660 drivers/base/core.c:1835 usb_set_configuration+0x104e/0x1870 drivers/usb/core/message.c:1932 generic_probe+0x73/0xe0 drivers/usb/core/generic.c:174 usb_probe_device+0xaf/0xe0 drivers/usb/core/driver.c:266 really_probe drivers/base/dd.c:413 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463 __device_attach+0x26b/0x3c0 drivers/base/dd.c:710 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523 device_add+0xd0b/0x1660 drivers/base/core.c:1835 usb_new_device+0x7b8/0x1020 drivers/usb/core/hub.c:2457 hub_port_connect drivers/usb/core/hub.c:4903 hub_port_connect_change drivers/usb/core/hub.c:5009 port_event drivers/usb/core/hub.c:5115 hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195 process_one_work+0xc73/0x1d90 kernel/workqueue.c:2119 worker_thread+0x221/0x1850 kernel/workqueue.c:2253 kthread+0x363/0x440 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Freed by task 24: save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:524 slab_free_hook mm/slub.c:1390 slab_free_freelist_hook mm/slub.c:1412 slab_free mm/slub.c:2988 kfree+0xf6/0x2f0 mm/slub.c:3919 dtt200u_fe_release+0x3c/0x50 drivers/media/usb/dvb-usb/dtt200u-fe.c:202 dvb_frontend_invoke_release.part.13+0x1c/0x30 drivers/media/dvb-core/dvb_frontend.c:2790 dvb_frontend_invoke_release drivers/media/dvb-core/dvb_frontend.c:2789 __dvb_frontend_free+0xad/0x120 drivers/media/dvb-core/dvb_frontend.c:153 dvb_frontend_put+0x59/0x70 drivers/media/dvb-core/dvb_frontend.c:176 dvb_frontend_detach+0x120/0x150 drivers/media/dvb-core/dvb_frontend.c:2803 dvb_usb_adapter_frontend_exit+0xd6/0x160 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:340 dvb_usb_adapter_exit drivers/media/usb/dvb-usb/dvb-usb-init.c:116 dvb_usb_exit+0x9b/0x200 drivers/media/usb/dvb-usb/dvb-usb-init.c:132 dvb_usb_device_exit+0xa5/0xf0 drivers/media/usb/dvb-usb/dvb-usb-init.c:295 usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423 __device_release_driver drivers/base/dd.c:861 device_release_driver_internal+0x4f1/0x5c0 drivers/base/dd.c:893 device_release_driver+0x1e/0x30 drivers/base/dd.c:918 bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565 device_del+0x5c4/0xab0 drivers/base/core.c:1985 usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170 usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124 hub_port_connect drivers/usb/core/hub.c:4754 hub_port_connect_change drivers/usb/core/hub.c:5009 port_event drivers/usb/core/hub.c:5115 hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195 process_one_work+0xc73/0x1d90 kernel/workqueue.c:2119 worker_thread+0x221/0x1850 kernel/workqueue.c:2253 kthread+0x363/0x440 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 The buggy address belongs to the object at ffff880067d45500 which belongs to the cache kmalloc-2048 of size 2048 The buggy address is located 1280 bytes inside of 2048-byte region [ffff880067d45500, ffff880067d45d00) The buggy address belongs to the page: page:ffffea00019f5000 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 0000000000000000 0000000000000000 00000001000f000f raw: dead000000000100 dead000000000200 ffff88006c002d80 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff880067d45900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff880067d45980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff880067d45a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff880067d45a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff880067d45b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: ead666000a5f ("media: dvb_frontend: only use kref after initialized") Reported-by: Andrey Konovalov <andreyknvl@google.com> Suggested-by: Matthias Schwarzott <zzam@gentoo.org> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17media: dvb-core: always call invoke_release() in fe_free()Daniel Scheller
commit 62229de19ff2b7f3e0ebf4d48ad99061127d0281 upstream. Follow-up to: ead666000a5f ("media: dvb_frontend: only use kref after initialized") The aforementioned commit fixed refcount OOPSes when demod driver attaching succeeded but tuner driver didn't. However, the use count of the attached demod drivers don't go back to zero and thus couldn't be cleanly unloaded. Improve on this by calling dvb_frontend_invoke_release() in __dvb_frontend_free() regardless of fepriv being NULL, instead of returning when fepriv is NULL. This is safe to do since _invoke_release() will check for passed pointers being valid before calling the .release() function. [mchehab@s-opensource.com: changed the logic a little bit to reduce conflicts with another bug fix patch under review] Fixes: ead666000a5f ("media: dvb_frontend: only use kref after initialized") Signed-off-by: Daniel Scheller <d.scheller@gmx.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17x86/intel_rdt: Fix potential deadlock during resctrl unmountReinette Chatre
[ Upstream commit 36b6f9fcb8928c06b6638a4cf91bc9d69bb49aa2 ] Lockdep warns about a potential deadlock: [ 66.782842] ====================================================== [ 66.782888] WARNING: possible circular locking dependency detected [ 66.782937] 4.14.0-rc2-test-test+ #48 Not tainted [ 66.782983] ------------------------------------------------------ [ 66.783052] umount/336 is trying to acquire lock: [ 66.783117] (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff81032395>] rdt_kill_sb+0x215/0x390 [ 66.783193] but task is already holding lock: [ 66.783244] (rdtgroup_mutex){+.+.}, at: [<ffffffff810321b6>] rdt_kill_sb+0x36/0x390 [ 66.783305] which lock already depends on the new lock. [ 66.783364] the existing dependency chain (in reverse order) is: [ 66.783419] -> #3 (rdtgroup_mutex){+.+.}: [ 66.783467] __lock_acquire+0x1293/0x13f0 [ 66.783509] lock_acquire+0xaf/0x220 [ 66.783543] __mutex_lock+0x71/0x9b0 [ 66.783575] mutex_lock_nested+0x1b/0x20 [ 66.783610] intel_rdt_online_cpu+0x3b/0x430 [ 66.783649] cpuhp_invoke_callback+0xab/0x8e0 [ 66.783687] cpuhp_thread_fun+0x7a/0x150 [ 66.783722] smpboot_thread_fn+0x1cc/0x270 [ 66.783764] kthread+0x16e/0x190 [ 66.783794] ret_from_fork+0x27/0x40 [ 66.783825] -> #2 (cpuhp_state){+.+.}: [ 66.783870] __lock_acquire+0x1293/0x13f0 [ 66.783906] lock_acquire+0xaf/0x220 [ 66.783938] cpuhp_issue_call+0x102/0x170 [ 66.783974] __cpuhp_setup_state_cpuslocked+0x154/0x2a0 [ 66.784023] __cpuhp_setup_state+0xc7/0x170 [ 66.784061] page_writeback_init+0x43/0x67 [ 66.784097] pagecache_init+0x43/0x4a [ 66.784131] start_kernel+0x3ad/0x3f7 [ 66.784165] x86_64_start_reservations+0x2a/0x2c [ 66.784204] x86_64_start_kernel+0x72/0x75 [ 66.784241] verify_cpu+0x0/0xfb [ 66.784270] -> #1 (cpuhp_state_mutex){+.+.}: [ 66.784319] __lock_acquire+0x1293/0x13f0 [ 66.784355] lock_acquire+0xaf/0x220 [ 66.784387] __mutex_lock+0x71/0x9b0 [ 66.784419] mutex_lock_nested+0x1b/0x20 [ 66.784454] __cpuhp_setup_state_cpuslocked+0x52/0x2a0 [ 66.784497] __cpuhp_setup_state+0xc7/0x170 [ 66.784535] page_alloc_init+0x28/0x30 [ 66.784569] start_kernel+0x148/0x3f7 [ 66.784602] x86_64_start_reservations+0x2a/0x2c [ 66.784642] x86_64_start_kernel+0x72/0x75 [ 66.784678] verify_cpu+0x0/0xfb [ 66.784707] -> #0 (cpu_hotplug_lock.rw_sem){++++}: [ 66.784759] check_prev_add+0x32f/0x6e0 [ 66.784794] __lock_acquire+0x1293/0x13f0 [ 66.784830] lock_acquire+0xaf/0x220 [ 66.784863] cpus_read_lock+0x3d/0xb0 [ 66.784896] rdt_kill_sb+0x215/0x390 [ 66.784930] deactivate_locked_super+0x3e/0x70 [ 66.784968] deactivate_super+0x40/0x60 [ 66.785003] cleanup_mnt+0x3f/0x80 [ 66.785034] __cleanup_mnt+0x12/0x20 [ 66.785070] task_work_run+0x8b/0xc0 [ 66.785103] exit_to_usermode_loop+0x94/0xa0 [ 66.786804] syscall_return_slowpath+0xe8/0x150 [ 66.788502] entry_SYSCALL_64_fastpath+0xab/0xad [ 66.790194] other info that might help us debug this: [ 66.795139] Chain exists of: cpu_hotplug_lock.rw_sem --> cpuhp_state --> rdtgroup_mutex [ 66.800035] Possible unsafe locking scenario: [ 66.803267] CPU0 CPU1 [ 66.804867] ---- ---- [ 66.806443] lock(rdtgroup_mutex); [ 66.808002] lock(cpuhp_state); [ 66.809565] lock(rdtgroup_mutex); [ 66.811110] lock(cpu_hotplug_lock.rw_sem); [ 66.812608] *** DEADLOCK *** [ 66.816983] 2 locks held by umount/336: [ 66.818418] #0: (&type->s_umount_key#35){+.+.}, at: [<ffffffff81229738>] deactivate_super+0x38/0x60 [ 66.819922] #1: (rdtgroup_mutex){+.+.}, at: [<ffffffff810321b6>] rdt_kill_sb+0x36/0x390 When the resctrl filesystem is unmounted the locks should be obtain in the locks in the same order as was done when the cpus came online: cpu_hotplug_lock before rdtgroup_mutex. This also requires to switch the static_branch_disable() calls to the _cpulocked variant because now cpu hotplug lock is held already. [ tglx: Switched to cpus_read_[un]lock ] Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Acked-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Acked-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/cc292e76be073f7260604651711c47b09fd0dc81.1508490116.git.reinette.chatre@intel.com Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17RDMA/cxgb4: Annotate r2 and stag as __be32Leon Romanovsky
[ Upstream commit 7d7d065a5eec7e218174d5c64a9f53f99ffdb119 ] Chelsio cxgb4 HW is big-endian, hence there is need to properly annotate r2 and stag fields as __be32 and not __u32 to fix the following sparse warnings. drivers/infiniband/hw/cxgb4/qp.c:614:16: warning: incorrect type in assignment (different base types) expected unsigned int [unsigned] [usertype] r2 got restricted __be32 [usertype] <noident> drivers/infiniband/hw/cxgb4/qp.c:615:18: warning: incorrect type in assignment (different base types) expected unsigned int [unsigned] [usertype] stag got restricted __be32 [usertype] <noident> Cc: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17md: free unused memory after bitmap resizeZdenek Kabelac
[ Upstream commit 0868b99c214a3d55486c700de7c3f770b7243e7c ] When bitmap is resized, the old kalloced chunks just are not released once the resized bitmap starts to use new space. This fixes in particular kmemleak reports like this one: unreferenced object 0xffff8f4311e9c000 (size 4096): comm "lvm", pid 19333, jiffies 4295263268 (age 528.265s) hex dump (first 32 bytes): 02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80 ................ 02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80 ................ backtrace: [<ffffffffa69471ca>] kmemleak_alloc+0x4a/0xa0 [<ffffffffa628c10e>] kmem_cache_alloc_trace+0x14e/0x2e0 [<ffffffffa676cfec>] bitmap_checkpage+0x7c/0x110 [<ffffffffa676d0c5>] bitmap_get_counter+0x45/0xd0 [<ffffffffa676d6b3>] bitmap_set_memory_bits+0x43/0xe0 [<ffffffffa676e41c>] bitmap_init_from_disk+0x23c/0x530 [<ffffffffa676f1ae>] bitmap_load+0xbe/0x160 [<ffffffffc04c47d3>] raid_preresume+0x203/0x2f0 [dm_raid] [<ffffffffa677762f>] dm_table_resume_targets+0x4f/0xe0 [<ffffffffa6774b52>] dm_resume+0x122/0x140 [<ffffffffa6779b9f>] dev_suspend+0x18f/0x290 [<ffffffffa677a3a7>] ctl_ioctl+0x287/0x560 [<ffffffffa677a693>] dm_ctl_ioctl+0x13/0x20 [<ffffffffa62d6b46>] do_vfs_ioctl+0xa6/0x750 [<ffffffffa62d7269>] SyS_ioctl+0x79/0x90 [<ffffffffa6956d41>] entry_SYSCALL_64_fastpath+0x1f/0xc2 Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17dm raid: fix panic when attempting to force a raid to syncHeinz Mauelshagen
[ Upstream commit 233978449074ca7e45d9c959f9ec612d1b852893 ] Requesting a sync on an active raid device via a table reload (see 'sync' parameter in Documentation/device-mapper/dm-raid.txt) skips the super_load() call that defines the superblock size (rdev->sb_size) -- resulting in an oops if/when super_sync()->memset() is called. Fix by moving the initialization of the superblock start and size out of super_load() to the caller (analyse_superblocks). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17audit: ensure that 'audit=1' actually enables audit for PID 1Paul Moore
[ Upstream commit 173743dd99a49c956b124a74c8aacb0384739a4c ] Prior to this patch we enabled audit in audit_init(), which is too late for PID 1 as the standard initcalls are run after the PID 1 task is forked. This means that we never allocate an audit_context (see audit_alloc()) for PID 1 and therefore miss a lot of audit events generated by PID 1. This patch enables audit as early as possible to help ensure that when PID 1 is forked it can allocate an audit_context if required. Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17audit: Allow auditd to set pid to 0 to end auditingSteve Grubb
[ Upstream commit 33e8a907804428109ce1d12301c3365d619cc4df ] The API to end auditing has historically been for auditd to set the pid to 0. This patch restores that functionality. See: https://github.com/linux-audit/audit-kernel/issues/69 Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17nvmet-rdma: update queue list during ib_device removalIsrael Rukshin
[ Upstream commit 43b92fd27aaef0f529c9321cfebbaec1d7b8f503 ] A NULL deref happens when nvmet_rdma_remove_one() is called more than once (e.g. while connected via 2 ports). The first call frees the queues related to the first ib_device but doesn't remove them from the queue list. While calling nvmet_rdma_remove_one() for the second ib_device it goes over the full queue list again and we get the NULL deref. Fixes: f1d4ef7d ("nvmet-rdma: register ib_client to not deadlock in device removal") Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grmberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17blk-mq: Avoid that request queue removal can trigger list corruptionBart Van Assche
[ Upstream commit aba7afc5671c23beade64d10caf86e24a9105dab ] Avoid that removal of a request queue sporadically triggers the following warning: list_del corruption. next->prev should be ffff8807d649b970, but was 6b6b6b6b6b6b6b6b WARNING: CPU: 3 PID: 342 at lib/list_debug.c:56 __list_del_entry_valid+0x92/0xa0 Call Trace: process_one_work+0x11b/0x660 worker_thread+0x3d/0x3b0 kthread+0x129/0x140 ret_from_fork+0x27/0x40 Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17ide: ide-atapi: fix compile error with defining macro DEBUGHongxu Jia
[ Upstream commit 8dc7a31fbce5e2dbbacd83d910da37105181b054 ] Compile ide-atapi failed with defining macro "DEBUG" ... |drivers/ide/ide-atapi.c:285:52: error: 'struct request' has no member named 'cmd'; did you mean 'csd'? | debug_log("%s: rq->cmd[0]: 0x%x\n", __func__, rq->cmd[0]); ... Since we split the scsi_request out of struct request, it missed do the same thing on debug_log Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request") Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17ipvlan: fix ipv6 outbound deviceKeefe Liu
[ Upstream commit ca29fd7cce5a6444d57fb86517589a1a31c759e1 ] When process the outbound packet of ipv6, we should assign the master device to output device other than input device. Signed-off-by: Keefe Liu <liuqifa@huawei.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17powerpc/powernv/idle: Round up latency and residency valuesVaidyanathan Srinivasan
[ Upstream commit 8d4e10e9ed9450e18fbbf6a8872be0eac9fd4999 ] On PowerNV platforms, firmware provides exit latency and target residency for each of the idle states in nano seconds. Cpuidle framework expects the values in micro seconds. Round up to nearest micro seconds to avoid errors in cases where the values are defined as fractional micro seconds. Default idle state of 'snooze' has exit latency of zero. If other states have fractional micro second exit latency, they would get rounded down to zero micro second and make cpuidle framework choose deeper idle state when snooze loop is the right choice. Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17kbuild: do not call cc-option before KBUILD_CFLAGS initializationMasahiro Yamada
[ Upstream commit 433dc2ebe7d17dd21cba7ad5c362d37323592236 ] Some $(call cc-option,...) are invoked very early, even before KBUILD_CFLAGS, etc. are initialized. The returned string from $(call cc-option,...) depends on KBUILD_CPPFLAGS, KBUILD_CFLAGS, and GCC_PLUGINS_CFLAGS. Since they are exported, they are not empty when the top Makefile is recursively invoked. The recursion occurs in several places. For example, the top Makefile invokes itself for silentoldconfig. "make tinyconfig", "make rpm-pkg" are the cases, too. In those cases, the second call of cc-option from the same line runs a different shell command due to non-pristine KBUILD_CFLAGS. To get the same result all the time, KBUILD_* and GCC_PLUGINS_CFLAGS must be initialized before any call of cc-option. This avoids garbage data in the .cache.mk file. Move all calls of cc-option below the config targets because target compiler flags are unnecessary for Kconfig. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17KVM: arm/arm64: vgic-its: Preserve the revious read from the pending tableMarc Zyngier
commit 64afe6e9eb4841f35317da4393de21a047a883b3 upstream. The current pending table parsing code assumes that we keep the previous read of the pending bits, but keep that variable in the current block, making sure it is discarded on each loop. We end-up using whatever is on the stack. Who knows, it might just be the right thing... Fixes: 33d3bc9556a7d ("KVM: arm64: vgic-its: Read initial LPI pending table") Reported-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17fix kcm_clone()Al Viro
commit a5739435b5a3b8c449f8844ecd71a3b1e89f0a33 upstream. 1) it's fput() or sock_release(), not both 2) don't do fd_install() until the last failure exit. 3) not a bug per se, but... don't attach socket to struct file until it's set up. Take reserving descriptor into the caller, move fd_install() to the caller, sanitize failure exits and calling conventions. Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscallJeff Layton
commit 4d2dc2cc766c3b51929658cacbc6e34fc8e242fb upstream. Currently, we're capping the values too low in the F_GETLK64 case. The fields in that structure are 64-bit values, so we shouldn't need to do any sort of fixup there. Make sure we check that assumption at build time in the future however by ensuring that the sizes we're copying will fit. With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it. Fixes: 94073ad77fff2 (fs/locks: don't mess with the address limit in compat_fcntl64) Reported-by: Vitaly Lipatov <lav@etersoft.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17usb: gadget: ffs: Forbid usb_ep_alloc_request from sleepingVincent Pelletier
commit 30bf90ccdec1da9c8198b161ecbff39ce4e5a9ba upstream. Found using DEBUG_ATOMIC_SLEEP while submitting an AIO read operation: [ 100.853642] BUG: sleeping function called from invalid context at mm/slab.h:421 [ 100.861148] in_atomic(): 1, irqs_disabled(): 1, pid: 1880, name: python [ 100.867954] 2 locks held by python/1880: [ 100.867961] #0: (&epfile->mutex){....}, at: [<f8188627>] ffs_mutex_lock+0x27/0x30 [usb_f_fs] [ 100.868020] #1: (&(&ffs->eps_lock)->rlock){....}, at: [<f818ad4b>] ffs_epfile_io.isra.17+0x24b/0x590 [usb_f_fs] [ 100.868076] CPU: 1 PID: 1880 Comm: python Not tainted 4.14.0-edison+ #118 [ 100.868085] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48 [ 100.868093] Call Trace: [ 100.868122] dump_stack+0x47/0x62 [ 100.868156] ___might_sleep+0xfd/0x110 [ 100.868182] __might_sleep+0x68/0x70 [ 100.868217] kmem_cache_alloc_trace+0x4b/0x200 [ 100.868248] ? dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3] [ 100.868302] dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3] [ 100.868343] usb_ep_alloc_request+0x16/0xc0 [udc_core] [ 100.868386] ffs_epfile_io.isra.17+0x444/0x590 [usb_f_fs] [ 100.868424] ? _raw_spin_unlock_irqrestore+0x27/0x40 [ 100.868457] ? kiocb_set_cancel_fn+0x57/0x60 [ 100.868477] ? ffs_ep0_poll+0xc0/0xc0 [usb_f_fs] [ 100.868512] ffs_epfile_read_iter+0xfe/0x157 [usb_f_fs] [ 100.868551] ? security_file_permission+0x9c/0xd0 [ 100.868587] ? rw_verify_area+0xac/0x120 [ 100.868633] aio_read+0x9d/0x100 [ 100.868692] ? __fget+0xa2/0xd0 [ 100.868727] ? __might_sleep+0x68/0x70 [ 100.868763] SyS_io_submit+0x471/0x680 [ 100.868878] do_int80_syscall_32+0x4e/0xd0 [ 100.868921] entry_INT80_32+0x2a/0x2a [ 100.868932] EIP: 0xb7fbb676 [ 100.868941] EFLAGS: 00000292 CPU: 1 [ 100.868951] EAX: ffffffda EBX: b7aa2000 ECX: 00000002 EDX: b7af8368 [ 100.868961] ESI: b7fbb660 EDI: b7aab000 EBP: bfb6c658 ESP: bfb6c638 [ 100.868973] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17ipmi: Stop timers before cleaning up the moduleMasamitsu Yamazaki
commit 4f7f5551a760eb0124267be65763008169db7087 upstream. System may crash after unloading ipmi_si.ko module because a timer may remain and fire after the module cleaned up resources. cleanup_one_si() contains the following processing. /* * Make sure that interrupts, the timer and the thread are * stopped and will not run again. */ if (to_clean->irq_cleanup) to_clean->irq_cleanup(to_clean); wait_for_timer_and_thread(to_clean); /* * Timeouts are stopped, now make sure the interrupts are off * in the BMC. Note that timers and CPU interrupts are off, * so no need for locks. */ while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) { poll(to_clean); schedule_timeout_uninterruptible(1); } si_state changes as following in the while loop calling poll(to_clean). SI_GETTING_MESSAGES => SI_CHECKING_ENABLES => SI_SETTING_ENABLES => SI_GETTING_EVENTS => SI_NORMAL As written in the code comments above, timers are expected to stop before the polling loop and not to run again. But the timer is set again in the following process when si_state becomes SI_SETTING_ENABLES. => poll => smi_event_handler => handle_transaction_done // smi_info->si_state == SI_SETTING_ENABLES => start_getting_events => start_new_msg => smi_mod_timer => mod_timer As a result, before the timer set in start_new_msg() expires, the polling loop may see si_state becoming SI_NORMAL and the module clean-up finishes. For example, hard LOCKUP and panic occurred as following. smi_timeout was called after smi_event_handler, kcs_event and hangs at port_inb() trying to access I/O port after release. [exception RIP: port_inb+19] RIP: ffffffffc0473053 RSP: ffff88069fdc3d80 RFLAGS: 00000006 RAX: ffff8806800f8e00 RBX: ffff880682bd9400 RCX: 0000000000000000 RDX: 0000000000000ca3 RSI: 0000000000000ca3 RDI: ffff8806800f8e40 RBP: ffff88069fdc3d80 R8: ffffffff81d86dfc R9: ffffffff81e36426 R10: 00000000000509f0 R11: 0000000000100000 R12: 0000000000]:000000 R13: 0000000000000000 R14: 0000000000000246 R15: ffff8806800f8e00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 --- <NMI exception stack> --- To fix the problem I defined a flag, timer_can_start, as member of struct smi_info. The flag is enabled immediately after initializing the timer and disabled immediately before waiting for timer deletion. Fixes: 0cfec916e86d ("ipmi: Start the timer and thread on internal msgs") Signed-off-by: Yamazaki Masamitsu <m-yamazaki@ah.jp.nec.com> [Some fairly major changes went into the IPMI driver in 4.15, so this required a backport as the code had changed and moved to a different file.] Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17sctp: use right member as the param of list_for_each_entryXin Long
[ Upstream commit a8dd397903a6e57157f6265911f7d35681364427 ] Commit d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues when migrating a sock") made a mistake that using 'list' as the param of list_for_each_entry to traverse the retransmit, sacked and abandoned queues, while chunks are using 'transmitted_list' to link into these queues. It could cause NULL dereference panic if there are chunks in any of these queues when peeling off one asoc. So use the chunk member 'transmitted_list' instead in this patch. Fixes: d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues when migrating a sock") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17cls_bpf: don't decrement net's refcount when offload failsJakub Kicinski
[ Upstream commit 25415cec502a1232b19fffc85465882b19a90415 ] When cls_bpf offload was added it seemed like a good idea to call cls_bpf_delete_prog() instead of extending the error handling path, since the software state is fully initialized at that point. This handling of errors without jumping to the end of the function is error prone, as proven by later commit missing that extra call to __cls_bpf_delete_prog(). __cls_bpf_delete_prog() is now expected to be invoked with a reference on exts->net or the field zeroed out. The call on the offload's error patch does not fullfil this requirement, leading to each error stealing a reference on net namespace. Create a function undoing what cls_bpf_set_parms() did and use it from __cls_bpf_delete_prog() and the error path. Fixes: aae2c35ec892 ("cls_bpf: use tcf_exts_get_net() before call_rcu()") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17net: openvswitch: datapath: fix data type in queue_gso_packetsGustavo A. R. Silva
[ Upstream commit 2734166e89639c973c6e125ac8bcfc2d9db72b70 ] gso_type is being used in binary AND operations together with SKB_GSO_UDP. The issue is that variable gso_type is of type unsigned short and SKB_GSO_UDP expands to more than 16 bits: SKB_GSO_UDP = 1 << 16 this makes any binary AND operation between gso_type and SKB_GSO_UDP to be always zero, hence making some code unreachable and likely causing undesired behavior. Fix this by changing the data type of variable gso_type to unsigned int. Addresses-Coverity-ID: 1462223 Fixes: 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17net: accept UFO datagrams from tuntap and packetWillem de Bruijn
[ Upstream commit 0c19f846d582af919db66a5914a0189f9f92c936 ] Tuntap and similar devices can inject GSO packets. Accept type VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively. Processes are expected to use feature negotiation such as TUNSETOFFLOAD to detect supported offload types and refrain from injecting other packets. This process breaks down with live migration: guest kernels do not renegotiate flags, so destination hosts need to expose all features that the source host does. Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677. This patch introduces nearly(*) no new code to simplify verification. It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP insertion and software UFO segmentation. It does not reinstate protocol stack support, hardware offload (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception of VIRTIO_NET_HDR_GSO_UDP packets in tuntap. To support SKB_GSO_UDP reappearing in the stack, also reinstate logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD by squashing in commit 939912216fa8 ("net: skb_needs_check() removes CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1 ("net: avoid skb_warn_bad_offload false positives on UFO"). (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id, ipv6_proxy_select_ident is changed to return a __be32 and this is assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted at the end of the enum to minimize code churn. Tested Booted a v4.13 guest kernel with QEMU. On a host kernel before this patch `ethtool -k eth0` shows UFO disabled. After the patch, it is enabled, same as on a v4.13 host kernel. A UFO packet sent from the guest appears on the tap device: host: nc -l -p -u 8000 & tcpdump -n -i tap0 guest: dd if=/dev/zero of=payload.txt bs=1 count=2000 nc -u 192.16.1.1 8000 < payload.txt Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds, packets arriving fragmented: ./with_tap_pair.sh ./tap_send_ufo tap0 tap1 (from https://github.com/wdebruij/kerneltools/tree/master/tests) Changes v1 -> v2 - simplified set_offload change (review comment) - documented test procedure Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com> Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.") Reported-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17tun: fix rcu_read_lock imbalance in tun_build_skbXin Long
[ Upstream commit 654d573845f35017dc397840fa03610fef3d08b0 ] rcu_read_lock in tun_build_skb is used to rcu_dereference tun->xdp_prog safely, rcu_read_unlock should be done in every return path. Now I could see one place missing it, where it returns NULL in switch-case XDP_REDIRECT, another palce using rcu_read_lock wrongly, where it returns NULL in if (xdp_xmit) chunk. So fix both in this patch. Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17net: ipv6: Fixup device for anycast routes during copyDavid Ahern
[ Upstream commit 98d11291d189cb5adf49694d0ad1b971c0212697 ] Florian reported a breakage with anycast routes due to commit 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address"). Prior to this commit anycast routes were added against the loopback device causing repetitive route entries with no insight into why they existed. e.g.: $ ip -6 ro ls table local type anycast anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium anycast fe80:: dev lo proto kernel metric 0 pref medium anycast fe80:: dev lo proto kernel metric 0 pref medium The point of commit 4832c30d5458 is to add the routes using the device with the address which is causing the route to be added. e.g.,: $ ip -6 ro ls table local type anycast anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium anycast fe80:: dev eth2 proto kernel metric 0 pref medium anycast fe80:: dev eth1 proto kernel metric 0 pref medium For traffic to work as it did before, the dst device needs to be switched to the loopback when the copy is created similar to local routes. Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17tun: free skb in early errorsWei Xu
[ Upstream commit c33ee15b3820a03cf8229ba9415084197b827f8c ] tun_recvmsg() supports accepting skb by msg_control after commit ac77cfd4258f ("tun: support receiving skb through msg_control"), the skb if presented should be freed no matter how far it can go along, otherwise it would be leaked. This patch fixes several missed cases. Signed-off-by: Wei Xu <wexu@redhat.com> Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17tcp: when scheduling TLP, time of RTO should account for current ACKNeal Cardwell
[ Upstream commit ed66dfaf236c04d414de1d218441296e57fb2bd2 ] Fix the TLP scheduling logic so that when scheduling a TLP probe, we ensure that the estimated time at which an RTO would fire accounts for the fact that ACKs indicating forward progress should push back RTO times. After the following fix: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") we had an unintentional behavior change in the following kind of scenario: suppose the RTT variance has been very low recently. Then suppose we send out a flight of N packets and our RTT is 100ms: t=0: send a flight of N packets t=100ms: receive an ACK for N-1 packets The response before df92c8394e6e that was: -> schedule a TLP for now + RTO_interval The response after df92c8394e6e is: -> schedule a TLP for t=0 + RTO_interval Since RTO_interval = srtt + RTT_variance, this means that we have scheduled a TLP timer at a point in the future that only accounts for RTT_variance. If the RTT_variance term is small, this means that the timer fires soon. Before df92c8394e6e this would not happen, because in that code, when we receive an ACK for a prefix of flight, we did: 1) Near the top of tcp_ack(), switch from TLP timer to RTO at write_queue_head->paket_tx_time + RTO_interval: if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) tcp_rearm_rto(sk); 2) In tcp_clean_rtx_queue(), update the RTO to now + RTO_interval: if (flag & FLAG_ACKED) { tcp_rearm_rto(sk); 3) In tcp_ack() after tcp_fastretrans_alert() switch from RTO to TLP at now + RTO_interval: if (icsk->icsk_pending == ICSK_TIME_RETRANS) tcp_schedule_loss_probe(sk); In df92c8394e6e we removed that 3-phase dance, and instead directly set the TLP timer once: we set the TLP timer in cases like this to write_queue_head->packet_tx_time + RTO_interval. So if the RTT variance is small, then this means that this is setting the TLP timer to fire quite soon. This means if the ACK for the tail of the flight takes longer than an RTT to arrive (often due to delayed ACKs), then the TLP timer fires too quickly. Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17tap: free skb if flags errorWei Xu
[ Upstream commit 61d78537843e676e7f56ac6db333db0c0529b892 ] tap_recvmsg() supports accepting skb by msg_control after commit 3b4ba04acca8 ("tap: support receiving skb from msg_control"), the skb if presented should be freed within the function, otherwise it would be leaked. Signed-off-by: Wei Xu <wexu@redhat.com> Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17net: sched: cbq: create block for q->link.blockJiri Pirko
[ Upstream commit d51aae68b142f48232257e96ce317db25445418d ] q->link.block is not initialized, that leads to EINVAL when one tries to add filter there. So initialize it properly. This can be reproduced by: $ tc qdisc add dev eth0 root handle 1: cbq avpkt 1000 rate 1000Mbit bandwidth 1000Mbit $ tc filter add dev eth0 parent 1: protocol ip prio 100 u32 match ip protocol 0 0x00 flowid 1:1 Reported-by: Jaroslav Aster <jaster@redhat.com> Reported-by: Ivan Vecera <ivecera@redhat.com> Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17tcp: use current time in tcp_rcv_space_adjust()Eric Dumazet
[ Upstream commit 8632385022f2b05a6ca0b9e0f95575865de0e2ce ] When I switched rcv_rtt_est to high resolution timestamps, I forgot that tp->tcp_mstamp needed to be refreshed in tcp_rcv_space_adjust() Using an old timestamp leads to autotuning lags. Fixes: 645f4c6f2ebd ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv()Tommi Rantala
[ Upstream commit c7799c067c2ae33e348508c8afec354f3257ff25 ] Remove the second tipc_rcv() call in tipc_udp_recv(). We have just checked that the bearer is not up, and calling tipc_rcv() with a bearer that is not up leads to a TIPC div-by-zero crash in tipc_node_calculate_timer(). The crash is rare in practice, but can happen like this: We're enabling a bearer, but it's not yet up and fully initialized. At the same time we receive a discovery packet, and in tipc_udp_recv() we end up calling tipc_rcv() with the not-yet-initialized bearer, causing later the div-by-zero crash in tipc_node_calculate_timer(). Jon Maloy explains the impact of removing the second tipc_rcv() call: "link setup in the worst case will be delayed until the next arriving discovery messages, 1 sec later, and this is an acceptable delay." As the tipc_rcv() call is removed, just leave the function via the rcu_out label, so that we will kfree_skb(). [ 12.590450] Own node address <1.1.1>, network identity 1 [ 12.668088] divide error: 0000 [#1] SMP [ 12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-dirty #1 [ 12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000 [ 12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] [ 12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246 [ 12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000 [ 12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600 [ 12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001 [ 12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8 [ 12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800 [ 12.702338] FS: 0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000 [ 12.705099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0 [ 12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 12.712627] Call Trace: [ 12.713390] <IRQ> [ 12.714011] tipc_node_check_dest+0x2e8/0x350 [tipc] [ 12.715286] tipc_disc_rcv+0x14d/0x1d0 [tipc] [ 12.716370] tipc_rcv+0x8b0/0xd40 [tipc] [ 12.717396] ? minmax_running_min+0x2f/0x60 [ 12.718248] ? dst_alloc+0x4c/0xa0 [ 12.718964] ? tcp_ack+0xaf1/0x10b0 [ 12.719658] ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc] [ 12.720634] tipc_udp_recv+0x71/0x1d0 [tipc] [ 12.721459] ? dst_alloc+0x4c/0xa0 [ 12.722130] udp_queue_rcv_skb+0x264/0x490 [ 12.722924] __udp4_lib_rcv+0x21e/0x990 [ 12.723670] ? ip_route_input_rcu+0x2dd/0xbf0 [ 12.724442] ? tcp_v4_rcv+0x958/0xa40 [ 12.725039] udp_rcv+0x1a/0x20 [ 12.725587] ip_local_deliver_finish+0x97/0x1d0 [ 12.726323] ip_local_deliver+0xaf/0xc0 [ 12.726959] ? ip_route_input_noref+0x19/0x20 [ 12.727689] ip_rcv_finish+0xdd/0x3b0 [ 12.728307] ip_rcv+0x2ac/0x360 [ 12.728839] __netif_receive_skb_core+0x6fb/0xa90 [ 12.729580] ? udp4_gro_receive+0x1a7/0x2c0 [ 12.730274] __netif_receive_skb+0x1d/0x60 [ 12.730953] ? __netif_receive_skb+0x1d/0x60 [ 12.731637] netif_receive_skb_internal+0x37/0xd0 [ 12.732371] napi_gro_receive+0xc7/0xf0 [ 12.732920] receive_buf+0x3c3/0xd40 [ 12.733441] virtnet_poll+0xb1/0x250 [ 12.733944] net_rx_action+0x23e/0x370 [ 12.734476] __do_softirq+0xc5/0x2f8 [ 12.734922] irq_exit+0xfa/0x100 [ 12.735315] do_IRQ+0x4f/0xd0 [ 12.735680] common_interrupt+0xa2/0xa2 [ 12.736126] </IRQ> [ 12.736416] RIP: 0010:native_safe_halt+0x6/0x10 [ 12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d [ 12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000 [ 12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88 [ 12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 [ 12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000 [ 12.741831] default_idle+0x2a/0x100 [ 12.742323] arch_cpu_idle+0xf/0x20 [ 12.742796] default_idle_call+0x28/0x40 [ 12.743312] do_idle+0x179/0x1f0 [ 12.743761] cpu_startup_entry+0x1d/0x20 [ 12.744291] start_secondary+0x112/0x120 [ 12.744816] secondary_startup_64+0xa5/0xa5 [ 12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00 00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48 89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f [ 12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0 [ 12.748555] ---[ end trace 1399ab83390650fd ]--- [ 12.749296] Kernel panic - not syncing: Fatal exception in interrupt [ 12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 12.751215] Rebooting in 60 seconds.. Fixes: c9b64d492b1f ("tipc: add replicast peer discovery") Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match()David Ahern
[ Usptream commit b4d1605a8ea608fd7dc45b926a05d75d340bde4b ] After this fix : ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()"), socket lookups happen while skb->cb[] has not been mangled yet by TCP. Fixes: a04a480d4392 ("net: Require exact match for TCP socket lookups if dif is l3mdev") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17s390/qeth: fix GSO throughput regressionJulian Wiedmann
[ Upstream commit 6d69b1f1eb7a2edf8a3547f361c61f2538e054bb ] Using GSO with small MTUs currently results in a substantial throughput regression - which is caused by how qeth needs to map non-linear skbs into its IO buffer elements: compared to a linear skb, each GSO-segmented skb effectively consumes twice as many buffer elements (ie two instead of one) due to the additional header-only part. This causes the Output Queue to be congested with low-utilized IO buffers. Fix this as follows: If the MSS is low enough so that a non-SG GSO segmentation produces order-0 skbs (currently ~3500 byte), opt out from NETIF_F_SG. This is where we anticipate the biggest savings, since an SG-enabled GSO segmentation produces skbs that always consume at least two buffer elements. Larger MSS values continue to get a SG-enabled GSO segmentation, since 1) the relative overhead of the additional header-only buffer element becomes less noticeable, and 2) the linearization overhead increases. With the throughput regression fixed, re-enable NETIF_F_SG by default to reap the significant CPU savings of GSO. Fixes: 5722963a8e83 ("qeth: do not turn on SG per default") Reported-by: Nils Hoppmann <niho@de.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17s390/qeth: fix thinko in IPv4 multicast address trackingJulian Wiedmann
[ Upsteam commit bc3ab70584696cb798b9e1e0ac8e6ced5fd4c3b8 ] Commit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") reworked how secondary addresses are managed for qeth devices. Instead of dropping & subsequently re-adding all addresses on every ndo_set_rx_mode() call, qeth now keeps track of the addresses that are currently registered with the HW. On a ndo_set_rx_mode(), we thus only need to do (de-)registration requests for the addresses that have actually changed. On L3 devices, the lookup for IPv4 Multicast addresses checks the wrong hashtable - and thus never finds a match. As a result, we first delete *all* such addresses, and then re-add them again. So each set_rx_mode() causes a short period where the IPv4 Multicast addresses are not registered, and the card stops forwarding inbound traffic for them. Fix this by setting the ->is_multicast flag on the lookup object, thus enabling qeth_l3_ip_from_hash() to search the correct hashtable and find a match there. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17s390/qeth: build max size GSO skbs on L2 devicesJulian Wiedmann
[ Upstream commit 0cbff6d4546613330a1c5f139f5c368e4ce33ca1 ] The current GSO skb size limit was copy&pasted over from the L3 path, where it is needed due to a TSO limitation. As L2 devices don't offer TSO support (and thus all GSO skbs are segmented before they reach the driver), there's no reason to restrict the stack in how large it may build the GSO skbs. Fixes: d52aec97e5bc ("qeth: enable scatter/gather in layer 2 mode") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17tcp/dccp: block bh before arming time_wait timerEric Dumazet
[ Upstream commit cfac7f836a715b91f08c851df915d401a4d52783 ] Maciej Żenczykowski reported some panics in tcp_twsk_destructor() that might be caused by the following bug. timewait timer is pinned to the cpu, because we want to transition timwewait refcount from 0 to 4 in one go, once everything has been initialized. At the time commit ed2e92394589 ("tcp/dccp: fix timewait races in timer handling") was merged, TCP was always running from BH habdler. After commit 5413d1babe8f ("net: do not block BH while processing socket backlog") we definitely can run tcp_time_wait() from process context. We need to block BH in the critical section so that the pinned timer has still its purpose. This bug is more likely to happen under stress and when very small RTO are used in datacenter flows. Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17stmmac: reset last TSO segment size after device openLars Persson
[ Upstream commit 45ab4b13e46325d00f4acdb365d406e941a15f81 ] The mss variable tracks the last max segment size sent to the TSO engine. We do not update the hardware as long as we receive skb:s with the same value in gso_size. During a network device down/up cycle (mapped to stmmac_release() and stmmac_open() callbacks) we issue a reset to the hardware and it forgets the setting for mss. However we did not zero out our mss variable so the next transmission of a gso packet happens with an undefined hardware setting. This triggers a hang in the TSO engine and eventuelly the netdev watchdog will bark. Fixes: f748be531d70 ("stmmac: support new GMAC4") Signed-off-by: Lars Persson <larper@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17net: remove hlist_nulls_add_tail_rcu()Eric Dumazet
[ Upstream commit d7efc6c11b277d9d80b99b1334a78bfe7d7edf10 ] Alexander Potapenko reported use of uninitialized memory [1] This happens when inserting a request socket into TCP ehash, in __sk_nulls_add_node_rcu(), since sk_reuseport is not initialized. Bug was added by commit d894ba18d4e4 ("soreuseport: fix ordering for mixed v4/v6 sockets") Note that d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix") missed the opportunity to get rid of hlist_nulls_add_tail_rcu() : Both UDP sockets and TCP/DCCP listeners no longer use __sk_nulls_add_node_rcu() for their hash insertion. Since all other sockets have unique 4-tuple, the reuseport status has no special meaning, so we can always use hlist_nulls_add_head_rcu() for them and save few cycles/instructions. [1] ================================================================== BUG: KMSAN: use of uninitialized memory in inet_ehash_insert+0xd40/0x1050 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3288 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace:  <IRQ>  __dump_stack lib/dump_stack.c:16  dump_stack+0x185/0x1d0 lib/dump_stack.c:52  kmsan_report+0x13f/0x1c0 mm/kmsan/kmsan.c:1016  __msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:766  __sk_nulls_add_node_rcu ./include/net/sock.h:684  inet_ehash_insert+0xd40/0x1050 net/ipv4/inet_hashtables.c:413  reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:754  inet_csk_reqsk_queue_hash_add+0x1cc/0x300 net/ipv4/inet_connection_sock.c:765  tcp_conn_request+0x31e7/0x36f0 net/ipv4/tcp_input.c:6414  tcp_v4_conn_request+0x16d/0x220 net/ipv4/tcp_ipv4.c:1314  tcp_rcv_state_process+0x42a/0x7210 net/ipv4/tcp_input.c:5917  tcp_v4_do_rcv+0xa6a/0xcd0 net/ipv4/tcp_ipv4.c:1483  tcp_v4_rcv+0x3de0/0x4ab0 net/ipv4/tcp_ipv4.c:1763  ip_local_deliver_finish+0x6bb/0xcb0 net/ipv4/ip_input.c:216  NF_HOOK ./include/linux/netfilter.h:248  ip_local_deliver+0x3fa/0x480 net/ipv4/ip_input.c:257  dst_input ./include/net/dst.h:477  ip_rcv_finish+0x6fb/0x1540 net/ipv4/ip_input.c:397  NF_HOOK ./include/linux/netfilter.h:248  ip_rcv+0x10f6/0x15c0 net/ipv4/ip_input.c:488  __netif_receive_skb_core+0x36f6/0x3f60 net/core/dev.c:4298  __netif_receive_skb net/core/dev.c:4336  netif_receive_skb_internal+0x63c/0x19c0 net/core/dev.c:4497  napi_skb_finish net/core/dev.c:4858  napi_gro_receive+0x629/0xa50 net/core/dev.c:4889  e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4018  e1000_clean_rx_irq+0x1492/0x1d30 drivers/net/ethernet/intel/e1000/e1000_main.c:4474  e1000_clean+0x43aa/0x5970 drivers/net/ethernet/intel/e1000/e1000_main.c:3819  napi_poll net/core/dev.c:5500  net_rx_action+0x73c/0x1820 net/core/dev.c:5566  __do_softirq+0x4b4/0x8dd kernel/softirq.c:284  invoke_softirq kernel/softirq.c:364  irq_exit+0x203/0x240 kernel/softirq.c:405  exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:638  do_IRQ+0x15e/0x1a0 arch/x86/kernel/irq.c:263  common_interrupt+0x86/0x86 Fixes: d894ba18d4e4 ("soreuseport: fix ordering for mixed v4/v6 sockets") Fixes: d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Alexander Potapenko <glider@google.com> Acked-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17usbnet: fix alignment for frames with no ethernet headerBjørn Mork
[ Upstream commit a4abd7a80addb4a9547f7dfc7812566b60ec505c ] The qmi_wwan minidriver support a 'raw-ip' mode where frames are received without any ethernet header. This causes alignment issues because the skbs allocated by usbnet are "IP aligned". Fix by allowing minidrivers to disable the additional alignment offset. This is implemented using a per-device flag, since the same minidriver also supports 'ethernet' mode. Fixes: 32f7adf633b9 ("net: qmi_wwan: support "raw IP" mode") Reported-and-tested-by: Jay Foster <jay@systech.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17tcp: remove buggy call to tcp_v6_restore_cb()Eric Dumazet
[ Upstream commit 3016dad75b48279e579117ee3ed566ba90a3b023 ] tcp_v6_send_reset() expects to receive an skb with skb->cb[] layout as used in TCP stack. MD5 lookup uses tcp_v6_iif() and tcp_v6_sdif() and thus TCP_SKB_CB(skb)->header.h6 This patch probably fixes RST packets sent on behalf of a timewait md5 ipv6 socket. Before Florian patch, tcp_v6_restore_cb() was needed before jumping to no_tcp_socket label. Fixes: 271c3b9b7bda ("tcp: honour SO_BINDTODEVICE for TW_RST case too") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17net/packet: fix a race in packet_bind() and packet_notifier()Eric Dumazet
[ Upstream commit 15fe076edea787807a7cdc168df832544b58eba6 ] syzbot reported crashes [1] and provided a C repro easing bug hunting. When/if packet_do_bind() calls __unregister_prot_hook() and releases po->bind_lock, another thread can run packet_notifier() and process an NETDEV_UP event. This calls register_prot_hook() and hooks again the socket right before first thread is able to grab again po->bind_lock. Fixes this issue by temporarily setting po->num to 0, as suggested by David Miller. [1] dev_remove_pack: ffff8801bf16fa80 not found ------------[ cut here ]------------ kernel BUG at net/core/dev.c:7945! ( BUG_ON(!list_empty(&dev->ptype_all)); ) invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: device syz0 entered promiscuous mode CPU: 0 PID: 3161 Comm: syzkaller404108 Not tainted 4.14.0+ #190 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801cc57a500 task.stack: ffff8801cc588000 RIP: 0010:netdev_run_todo+0x772/0xae0 net/core/dev.c:7945 RSP: 0018:ffff8801cc58f598 EFLAGS: 00010293 RAX: ffff8801cc57a500 RBX: dffffc0000000000 RCX: ffffffff841f75b2 RDX: 0000000000000000 RSI: 1ffff100398b1ede RDI: ffff8801bf1f8810 device syz0 entered promiscuous mode RBP: ffff8801cc58f898 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bf1f8cd8 R13: ffff8801cc58f870 R14: ffff8801bf1f8780 R15: ffff8801cc58f7f0 FS: 0000000001716880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020b13000 CR3: 0000000005e25000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:106 tun_detach drivers/net/tun.c:670 [inline] tun_chr_close+0x49/0x60 drivers/net/tun.c:2845 __fput+0x333/0x7f0 fs/file_table.c:210 ____fput+0x15/0x20 fs/file_table.c:244 task_work_run+0x199/0x270 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x9bb/0x1ae0 kernel/exit.c:865 do_group_exit+0x149/0x400 kernel/exit.c:968 SYSC_exit_group kernel/exit.c:979 [inline] SyS_exit_group+0x1d/0x20 kernel/exit.c:977 entry_SYSCALL_64_fastpath+0x1f/0x96 RIP: 0033:0x44ad19 Fixes: 30f7ea1c2b5f ("packet: race condition in packet_bind") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Francesco Ruggeri <fruggeri@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17packet: fix crash in fanout_demux_rollover()Mike Maloney
syzkaller found a race condition fanout_demux_rollover() while removing a packet socket from a fanout group. po->rollover is read and operated on during packet_rcv_fanout(), via fanout_demux_rollover(), but the pointer is currently cleared before the synchronization in packet_release(). It is safer to delay the cleanup until after synchronize_net() has been called, ensuring all calls to packet_rcv_fanout() for this socket have finished. To further simplify synchronization around the rollover structure, set po->rollover in fanout_add() only if there are no errors. This removes the need for rcu in the struct and in the call to packet_getsockopt(..., PACKET_ROLLOVER_STATS, ...). Crashing stack trace: fanout_demux_rollover+0xb6/0x4d0 net/packet/af_packet.c:1392 packet_rcv_fanout+0x649/0x7c8 net/packet/af_packet.c:1487 dev_queue_xmit_nit+0x835/0xc10 net/core/dev.c:1953 xmit_one net/core/dev.c:2975 [inline] dev_hard_start_xmit+0x16b/0xac0 net/core/dev.c:2995 __dev_queue_xmit+0x17a4/0x2050 net/core/dev.c:3476 dev_queue_xmit+0x17/0x20 net/core/dev.c:3509 neigh_connected_output+0x489/0x720 net/core/neighbour.c:1379 neigh_output include/net/neighbour.h:482 [inline] ip6_finish_output2+0xad1/0x22a0 net/ipv6/ip6_output.c:120 ip6_finish_output+0x2f9/0x920 net/ipv6/ip6_output.c:146 NF_HOOK_COND include/linux/netfilter.h:239 [inline] ip6_output+0x1f4/0x850 net/ipv6/ip6_output.c:163 dst_output include/net/dst.h:459 [inline] NF_HOOK.constprop.35+0xff/0x630 include/linux/netfilter.h:250 mld_sendpack+0x6a8/0xcc0 net/ipv6/mcast.c:1660 mld_send_initial_cr.part.24+0x103/0x150 net/ipv6/mcast.c:2072 mld_send_initial_cr net/ipv6/mcast.c:2056 [inline] ipv6_mc_dad_complete+0x99/0x130 net/ipv6/mcast.c:2079 addrconf_dad_completed+0x595/0x970 net/ipv6/addrconf.c:4039 addrconf_dad_work+0xac9/0x1160 net/ipv6/addrconf.c:3971 process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2113 worker_thread+0x223/0x1990 kernel/workqueue.c:2247 kthread+0x35e/0x430 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:432 Fixes: 0648ab70afe6 ("packet: rollover prepare: per-socket state") Fixes: 509c7a1ecc860 ("packet: avoid panic in packet_getsockopt()") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Mike Maloney <maloney@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()Eric Dumazet
[ Upstream commit eeea10b83a139451130df1594f26710c8fa390c8 ] James Morris reported kernel stack corruption bug [1] while running the SELinux testsuite, and bisected to a recent commit bffa72cf7f9d ("net: sk_buff rbnode reorg") We believe this commit is fine, but exposes an older bug. SELinux code runs from tcp_filter() and might send an ICMP, expecting IP options to be found in skb->cb[] using regular IPCB placement. We need to defer TCP mangling of skb->cb[] after tcp_filter() calls. This patch adds tcp_v4_fill_cb()/tcp_v4_restore_cb() in a very similar way we added them for IPv6. [1] [ 339.806024] SELinux: failure in selinux_parse_skb(), unable to parse packet [ 339.822505] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81745af5 [ 339.822505] [ 339.852250] CPU: 4 PID: 3642 Comm: client Not tainted 4.15.0-rc1-test #15 [ 339.868498] Hardware name: LENOVO 10FGS0VA1L/30BC, BIOS FWKT68A 01/19/2017 [ 339.885060] Call Trace: [ 339.896875] <IRQ> [ 339.908103] dump_stack+0x63/0x87 [ 339.920645] panic+0xe8/0x248 [ 339.932668] ? ip_push_pending_frames+0x33/0x40 [ 339.946328] ? icmp_send+0x525/0x530 [ 339.958861] ? kfree_skbmem+0x60/0x70 [ 339.971431] __stack_chk_fail+0x1b/0x20 [ 339.984049] icmp_send+0x525/0x530 [ 339.996205] ? netlbl_skbuff_err+0x36/0x40 [ 340.008997] ? selinux_netlbl_err+0x11/0x20 [ 340.021816] ? selinux_socket_sock_rcv_skb+0x211/0x230 [ 340.035529] ? security_sock_rcv_skb+0x3b/0x50 [ 340.048471] ? sk_filter_trim_cap+0x44/0x1c0 [ 340.061246] ? tcp_v4_inbound_md5_hash+0x69/0x1b0 [ 340.074562] ? tcp_filter+0x2c/0x40 [ 340.086400] ? tcp_v4_rcv+0x820/0xa20 [ 340.098329] ? ip_local_deliver_finish+0x71/0x1a0 [ 340.111279] ? ip_local_deliver+0x6f/0xe0 [ 340.123535] ? ip_rcv_finish+0x3a0/0x3a0 [ 340.135523] ? ip_rcv_finish+0xdb/0x3a0 [ 340.147442] ? ip_rcv+0x27c/0x3c0 [ 340.158668] ? inet_del_offload+0x40/0x40 [ 340.170580] ? __netif_receive_skb_core+0x4ac/0x900 [ 340.183285] ? rcu_accelerate_cbs+0x5b/0x80 [ 340.195282] ? __netif_receive_skb+0x18/0x60 [ 340.207288] ? process_backlog+0x95/0x140 [ 340.218948] ? net_rx_action+0x26c/0x3b0 [ 340.230416] ? __do_softirq+0xc9/0x26a [ 340.241625] ? do_softirq_own_stack+0x2a/0x40 [ 340.253368] </IRQ> [ 340.262673] ? do_softirq+0x50/0x60 [ 340.273450] ? __local_bh_enable_ip+0x57/0x60 [ 340.285045] ? ip_finish_output2+0x175/0x350 [ 340.296403] ? ip_finish_output+0x127/0x1d0 [ 340.307665] ? nf_hook_slow+0x3c/0xb0 [ 340.318230] ? ip_output+0x72/0xe0 [ 340.328524] ? ip_fragment.constprop.54+0x80/0x80 [ 340.340070] ? ip_local_out+0x35/0x40 [ 340.350497] ? ip_queue_xmit+0x15c/0x3f0 [ 340.361060] ? __kmalloc_reserve.isra.40+0x31/0x90 [ 340.372484] ? __skb_clone+0x2e/0x130 [ 340.382633] ? tcp_transmit_skb+0x558/0xa10 [ 340.393262] ? tcp_connect+0x938/0xad0 [ 340.403370] ? ktime_get_with_offset+0x4c/0xb0 [ 340.414206] ? tcp_v4_connect+0x457/0x4e0 [ 340.424471] ? __inet_stream_connect+0xb3/0x300 [ 340.435195] ? inet_stream_connect+0x3b/0x60 [ 340.445607] ? SYSC_connect+0xd9/0x110 [ 340.455455] ? __audit_syscall_entry+0xaf/0x100 [ 340.466112] ? syscall_trace_enter+0x1d0/0x2b0 [ 340.476636] ? __audit_syscall_exit+0x209/0x290 [ 340.487151] ? SyS_connect+0xe/0x10 [ 340.496453] ? do_syscall_64+0x67/0x1b0 [ 340.506078] ? entry_SYSCALL64_slow_path+0x25/0x25 Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: James Morris <james.l.morris@oracle.com> Tested-by: James Morris <james.l.morris@oracle.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17sit: update frag_off infoHangbin Liu
[ Upstream commit f859b4af1c52493ec21173ccc73d0b60029b5b88 ] After parsing the sit netlink change info, we forget to update frag_off in ipip6_tunnel_update(). Fix it by assigning frag_off with new value. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17rds: Fix NULL pointer dereference in __rds_rdma_mapHåkon Bugge
[ Upstream commit f3069c6d33f6ae63a1668737bc78aaaa51bff7ca ] This is a fix for syzkaller719569, where memory registration was attempted without any underlying transport being loaded. Analysis of the case reveals that it is the setsockopt() RDS_GET_MR (2) and RDS_GET_MR_FOR_DEST (7) that are vulnerable. Here is an example stack trace when the bug is hit: BUG: unable to handle kernel NULL pointer dereference at 00000000000000c0 IP: __rds_rdma_map+0x36/0x440 [rds] PGD 2f93d03067 P4D 2f93d03067 PUD 2f93d02067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: bridge stp llc tun rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rds binfmt_misc sb_edac intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul c rc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd iTCO_wdt mei_me sg iTCO_vendor_support ipmi_si mei ipmi_devintf nfsd shpchp pcspkr i2c_i801 ioatd ma ipmi_msghandler wmi lpc_ich mfd_core auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper ixgbe syscopyarea ahci sysfillrect sysimgblt libahci mdio fb_sys_fops ttm ptp libata sd_mod mlx4_core drm crc32c_intel pps_core megaraid_sas i2c_core dca dm_mirror dm_region_hash dm_log dm_mod CPU: 48 PID: 45787 Comm: repro_set2 Not tainted 4.14.2-3.el7uek.x86_64 #2 Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017 task: ffff882f9190db00 task.stack: ffffc9002b994000 RIP: 0010:__rds_rdma_map+0x36/0x440 [rds] RSP: 0018:ffffc9002b997df0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff882fa2182580 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffc9002b997e40 RDI: ffff882fa2182580 RBP: ffffc9002b997e30 R08: 0000000000000000 R09: 0000000000000002 R10: ffff885fb29e3838 R11: 0000000000000000 R12: ffff882fa2182580 R13: ffff882fa2182580 R14: 0000000000000002 R15: 0000000020000ffc FS: 00007fbffa20b700(0000) GS:ffff882fbfb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c0 CR3: 0000002f98a66006 CR4: 00000000001606e0 Call Trace: rds_get_mr+0x56/0x80 [rds] rds_setsockopt+0x172/0x340 [rds] ? __fget_light+0x25/0x60 ? __fdget+0x13/0x20 SyS_setsockopt+0x80/0xe0 do_syscall_64+0x67/0x1b0 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fbff9b117f9 RSP: 002b:00007fbffa20aed8 EFLAGS: 00000293 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00000000000c84a4 RCX: 00007fbff9b117f9 RDX: 0000000000000002 RSI: 0000400000000114 RDI: 000000000000109b RBP: 00007fbffa20af10 R08: 0000000000000020 R09: 00007fbff9dd7860 R10: 0000000020000ffc R11: 0000000000000293 R12: 0000000000000000 R13: 00007fbffa20b9c0 R14: 00007fbffa20b700 R15: 0000000000000021 Code: 41 56 41 55 49 89 fd 41 54 53 48 83 ec 18 8b 87 f0 02 00 00 48 89 55 d0 48 89 4d c8 85 c0 0f 84 2d 03 00 00 48 8b 87 00 03 00 00 <48> 83 b8 c0 00 00 00 00 0f 84 25 03 00 0 0 48 8b 06 48 8b 56 08 The fix is to check the existence of an underlying transport in __rds_rdma_map(). Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17vhost: fix skb leak in handle_rx()Wei Xu
[ Upstream commit 6e474083f3daf3a3546737f5d7d502ad12eb257c ] Matthew found a roughly 40% tcp throughput regression with commit c67df11f(vhost_net: try batch dequing from skb array) as discussed in the following thread: https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html Eventually we figured out that it was a skb leak in handle_rx() when sending packets to the VM. This usually happens when a guest can not drain out vq as fast as vhost fills in, afterwards it sets off the traffic jam and leaks skb(s) which occurs as no headcount to send on the vq from vhost side. This can be avoided by making sure we have got enough headcount before actually consuming a skb from the batched rx array while transmitting, which is simply done by moving checking the zero headcount a bit ahead. Signed-off-by: Wei Xu <wexu@redhat.com> Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17tipc: fix memory leak in tipc_accept_from_sock()Jon Maloy
[ Upstream commit a7d5f107b4978e08eeab599ee7449af34d034053 ] When the function tipc_accept_from_sock() fails to create an instance of struct tipc_subscriber it omits to free the already created instance of struct tipc_conn instance before it returns. We fix that with this commit. Reported-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17s390/qeth: fix early exit from error pathJulian Wiedmann
[ Upstream commit 83cf79a2fec3cf499eb6cb9eb608656fc2a82776 ] When the allocation of the addr buffer fails, we need to free our refcount on the inetdevice before returning. Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-17net: realtek: r8169: implement set_link_ksettings()Tobias Jakobi
[ Upstream commit 9e77d7a5549dc4d4999a60676373ab3fd1dae4db ] Commit 6fa1ba61520576cf1346c4ff09a056f2950cb3bf partially implemented the new ethtool API, by replacing get_settings() with get_link_ksettings(). This breaks ethtool, since the userspace tool (according to the new API specs) never tries the legacy set() call, when the new get() call succeeds. All attempts to chance some setting from userspace result in: > Cannot set new settings: Operation not supported Implement the missing set() call. Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>