summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2023-06-28rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()Paul E. McKenney
[ Upstream commit a63fc6b75cca984c71f095282e0227a390ba88f3 ] Although the rcu_swap_protected() macro follows the example of swap(), the interactions with RCU make its update of its argument somewhat counter-intuitive. This commit therefore introduces an rcu_replace_pointer() that returns the old value of the RCU pointer instead of doing the argument update. Once all the uses of rcu_swap_protected() are updated to instead use rcu_replace_pointer(), rcu_swap_protected() will be removed. Link: https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4gg6Hw@mail.gmail.com/ Reported-by: Linus Torvalds <torvalds@linux-foundation.org> [ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Shane M Seymour <shane.seymour@hpe.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Stable-dep-of: a61675294735 ("ieee802154: hwsim: Fix possible memory leaks") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-28list: add "list_del_init_careful()" to go with "list_empty_careful()"Linus Torvalds
[ Upstream commit c6fe44d96fc1536af5b11cd859686453d1b7bfd1 ] That gives us ordering guarantees around the pair. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: 2192bba03d80 ("epoll: ep_autoremove_wake_function should use list_del_init_careful") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-21Remove DECnet support from kernelStephen Hemminger
commit 1202cdd665315c525b5237e96e0bedc76d7e754f upstream. DECnet is an obsolete network protocol that receives more attention from kernel janitors than users. It belongs in computer protocol history museum not in Linux kernel. It has been "Orphaned" in kernel since 2010. The iproute2 support for DECnet was dropped in 5.0 release. The documentation link on Sourceforge says it is abandoned there as well. Leave the UAPI alone to keep userspace programs compiling. This means that there is still an empty neighbour table for AF_DECNET. The table of /proc/sys/net entries was updated to match current directories and reformatted to be alphabetical. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: David Ahern <dsahern@kernel.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-14usb: usbfs: Enforce page requirements for mmapRuihan Li
commit 0143d148d1e882fb1538dc9974c94d63961719b9 upstream. The current implementation of usbdev_mmap uses usb_alloc_coherent to allocate memory pages that will later be mapped into the user space. Meanwhile, usb_alloc_coherent employs three different methods to allocate memory, as outlined below: * If hcd->localmem_pool is non-null, it uses gen_pool_dma_alloc to allocate memory; * If DMA is not available, it uses kmalloc to allocate memory; * Otherwise, it uses dma_alloc_coherent. However, it should be noted that gen_pool_dma_alloc does not guarantee that the resulting memory will be page-aligned. Furthermore, trying to map slab pages (i.e., memory allocated by kmalloc) into the user space is not resonable and can lead to problems, such as a type confusion bug when PAGE_TABLE_CHECK=y [1]. To address these issues, this patch introduces hcd_alloc_coherent_pages, which addresses the above two problems. Specifically, hcd_alloc_coherent_pages uses gen_pool_dma_alloc_align instead of gen_pool_dma_alloc to ensure that the memory is page-aligned. To replace kmalloc, hcd_alloc_coherent_pages directly allocates pages by calling __get_free_pages. Reported-by: syzbot+fcf1a817ceb50935ce99@syzkaller.appspotmail.comm Closes: https://lore.kernel.org/lkml/000000000000258e5e05fae79fc1@google.com/ [1] Fixes: f7d34b445abc ("USB: Add support for usbfs zerocopy.") Fixes: ff2437befd8f ("usb: host: Fix excessive alignment restriction for local memory allocations") Cc: stable@vger.kernel.org Signed-off-by: Ruihan Li <lrh2000@pku.edu.cn> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/20230515130958.32471-2-lrh2000@pku.edu.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-14rfs: annotate lockless accesses to RFS sock flow tableEric Dumazet
[ Upstream commit 5c3b74a92aa285a3df722bf6329ba7ccf70346d6 ] Add READ_ONCE()/WRITE_ONCE() on accesses to the sock flow table. This also prevents a (smart ?) compiler to remove the condition in: if (table->ents[index] != newval) table->ents[index] = newval; We need the condition to avoid dirtying a shared cache line. Fixes: fec5e652e58f ("rfs: Receive Flow Steering") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider useAndy Shevchenko
[ Upstream commit dd3e7cba16274831f5a69f071ed3cf13ffb352ea ] There are users already and will be more of BITS_TO_BYTES() macro. Move it to bitops.h for wider use. In the case of ocfs2 the replacement is identical. As for bnx2x, there are two places where floor version is used. In the first case to calculate the amount of structures that can fit one memory page. In this case obviously the ceiling variant is correct and original code might have a potential bug, if amount of bits % 8 is not 0. In the second case the macro is used to calculate bytes transmitted in one microsecond. This will work for all speeds which is multiply of 1Gbps without any change, for the rest new code will give ceiling value, for instance 100Mbps will give 13 bytes, while old code gives 12 bytes and the arithmetically correct one is 12.5 bytes. Further the value is used to setup timer threshold which in any case has its own margins due to certain resolution. I don't see here an issue with slightly shifting thresholds for low speed connections, the card is supposed to utilize highest available rate, which is usually 10Gbps. Link: http://lkml.kernel.org/r/20200108121316.22411-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Stable-dep-of: f4e4534850a9 ("net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05fs: fix undefined behavior in bit shift for SB_NOUSERHao Ge
[ Upstream commit f15afbd34d8fadbd375f1212e97837e32bc170cc ] Shifting signed 32-bit value by 31 bits is undefined, so changing significant bit to unsigned. It was spotted by UBSAN. So let's just fix this by using the BIT() helper for all SB_* flags. Fixes: e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags") Signed-off-by: Hao Ge <gehao@kylinos.cn> Message-Id: <20230424051835.374204-1-gehao@kylinos.cn> [brauner@kernel.org: use BIT() for all SB_* flags] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05power: supply: core: Refactor ↵Hans de Goede
power_supply_set_input_current_limit_from_supplier() [ Upstream commit 2220af8ca61ae67de4ec3deec1c6395a2f65b9fd ] Some (USB) charger ICs have variants with USB D+ and D- pins to do their own builtin charger-type detection, like e.g. the bq24190 and bq25890 and also variants which lack this functionality, e.g. the bq24192 and bq25892. In case the charger-type; and thus the input-current-limit detection is done outside the charger IC then we need some way to communicate this to the charger IC. In the past extcon was used for this, but if the external detection does e.g. full USB PD negotiation then the extcon cable-types do not convey enough information. For these setups it was decided to model the external charging "brick" and the parameters negotiated with it as a power_supply class-device itself; and power_supply_set_input_current_limit_from_supplier() was introduced to allow drivers to get the input-current-limit this way. But in some cases psy drivers may want to know other properties, e.g. the bq25892 can do "quick-charge" negotiation by pulsing its current draw, but this should only be done if the usb_type psy-property of its supplier is set to DCP (and device-properties indicate the board allows higher voltages). Instead of adding extra helper functions for each property which a psy-driver wants to query from its supplier, refactor power_supply_set_input_current_limit_from_supplier() into a more generic power_supply_get_property_from_supplier() function. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Stable-dep-of: 77c2a3097d70 ("power: supply: bq24190: Call power_supply_changed() after updating input current") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05cdc_ncm: Implement the 32-bit version of NCM Transfer BlockAlexander Bersenev
[ Upstream commit 0fa81b304a7973a499f844176ca031109487dd31 ] The NCM specification defines two formats of transfer blocks: with 16-bit fields (NTB-16) and with 32-bit fields (NTB-32). Currently only NTB-16 is implemented. This patch adds the support of NTB-32. The motivation behind this is that some devices such as E5785 or E5885 from the current generation of Huawei LTE routers do not support NTB-16. The previous generations of Huawei devices are also use NTB-32 by default. Also this patch enables NTB-32 by default for Huawei devices. During the 2019 ValdikSS made five attempts to contact Huawei to add the NTB-16 support to their router firmware, but they were unsuccessful. Signed-off-by: Alexander Bersenev <bay@hackerdom.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 7e01c7f7046e ("net: cdc_ncm: Deal with too low values of dwNtbOutMaxSize") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30power: supply: bq27xxx: Fix poll_interval handling and races on removeHans de Goede
commit c00bc80462afc7963f449d7f21d896d2f629cacc upstream. Before this patch bq27xxx_battery_teardown() was setting poll_interval = 0 to avoid bq27xxx_battery_update() requeuing the delayed_work item. There are 2 problems with this: 1. If the driver is unbound through sysfs, rather then the module being rmmod-ed, this changes poll_interval unexpectedly 2. This is racy, after it being set poll_interval could be changed before bq27xxx_battery_update() checks it through /sys/module/bq27xxx_battery/parameters/poll_interval Fix this by added a removed attribute to struct bq27xxx_device_info and using that instead of setting poll_interval to 0. There also is another poll_interval related race on remove(), writing /sys/module/bq27xxx_battery/parameters/poll_interval will requeue the delayed_work item for all devices on the bq27xxx_battery_devices list and the device being removed was only removed from that list after cancelling the delayed_work item. Fix this by moving the removal from the bq27xxx_battery_devices list to before cancelling the delayed_work item. Fixes: 8cfaaa811894 ("bq27x00_battery: Fix OOPS caused by unregistring bq27x00 driver") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30USB: core: Add routines for endpoint checks in old driversAlan Stern
commit 13890626501ffda22b18213ddaf7930473da5792 upstream. Many of the older USB drivers in the Linux USB stack were written based simply on a vendor's device specification. They use the endpoint information in the spec and assume these endpoints will always be present, with the properties listed, in any device matching the given vendor and product IDs. While that may have been true back then, with spoofing and fuzzing it is not true any more. More and more we are finding that those old drivers need to perform at least a minimum of checking before they try to use any endpoint other than ep0. To make this checking as simple as possible, we now add a couple of utility routines to the USB core. usb_check_bulk_endpoints() and usb_check_int_endpoints() take an interface pointer together with a list of endpoint addresses (numbers and directions). They check that the interface's current alternate setting includes endpoints with those addresses and that each of these endpoints has the right type: bulk or interrupt, respectively. Although we already have usb_find_common_endpoints() and related routines meant for a similar purpose, they are not well suited for this kind of checking. Those routines find endpoints of various kinds, but only one (either the first or the last) of each kind, and they don't verify that the endpoints' addresses agree with what the caller expects. In theory the new routines could be more general: They could take a particular altsetting as their argument instead of always using the interface's current altsetting. In practice I think this won't matter too much; multiple altsettings tend to be used for transferring media (audio or visual) over isochronous endpoints, not bulk or interrupt. Drivers for such devices will generally require more sophisticated checking than these simplistic routines provide. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/dd2c8e8c-2c87-44ea-ba17-c64b97e201c9@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30net: fix stack overflow when LRO is disabled for virtual interfacesTaehee Yoo
commit ae9b15fbe63447bc1d3bba3769f409d17ca6fdf6 upstream. When the virtual interface's feature is updated, it synchronizes the updated feature for its own lower interface. This propagation logic should be worked as the iteration, not recursively. But it works recursively due to the netdev notification unexpectedly. This problem occurs when it disables LRO only for the team and bonding interface type. team0 | +------+------+-----+-----+ | | | | | team1 team2 team3 ... team200 If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE event to its own lower interfaces(team1 ~ team200). It is worked by netdev_sync_lower_features(). So, the NETDEV_FEAT_CHANGE notification logic of each lower interface work iteratively. But generated NETDEV_FEAT_CHANGE event is also sent to the upper interface too. upper interface(team0) generates the NETDEV_FEAT_CHANGE event for its own lower interfaces again. lower and upper interfaces receive this event and generate this event again and again. So, the stack overflow occurs. But it is not the infinite loop issue. Because the netdev_sync_lower_features() updates features before generating the NETDEV_FEAT_CHANGE event. Already synchronized lower interfaces skip notification logic. So, it is just the problem that iteration logic is changed to the recursive unexpectedly due to the notification mechanism. Reproducer: ip link add team0 type team ethtool -K team0 lro on for i in {1..200} do ip link add team$i master team0 type team ethtool -K team$i lro on done ethtool -K team0 lro off In order to fix it, the notifier_ctx member of bonding/team is introduced. Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20230517143010.3596250-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30lib/string_helpers: Introduce string_upper() and string_lower() helpersVadim Pasternak
[ Upstream commit 58eeba0bdb52afe5c18ce2a760ca9fe2901943e9 ] Provide the helpers for string conversions to upper and lower cases. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Stable-dep-of: 3c0f4f09c063 ("usb: gadget: u_ether: Fix host MAC address case") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30sched: Fix KCSAN noinstr violationJosh Poimboeuf
[ Upstream commit e0b081d17a9f4e5c0cbb0e5fbeb1abe3de0f7e4e ] With KCSAN enabled, end_of_stack() can get out-of-lined. Force it inline. Fixes the following warnings: vmlinux.o: warning: objtool: check_stackleak_irqoff+0x2b: call to end_of_stack() leaves .noinstr.text section Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/cc1b4d73d3a428a00d206242a68fdf99a934ca7b.1681320026.git.jpoimboe@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30firmware: arm_sdei: Fix sleep from invalid context BUGPierre Gondois
[ Upstream commit d2c48b2387eb89e0bf2a2e06e30987cf410acad4 ] Running a preempt-rt (v6.2-rc3-rt1) based kernel on an Ampere Altra triggers: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 24, name: cpuhp/0 preempt_count: 0, expected: 0 RCU nest depth: 0, expected: 0 3 locks held by cpuhp/0/24: #0: ffffda30217c70d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248 #1: ffffda30217c7120 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248 #2: ffffda3021c711f0 (sdei_list_lock){....}-{3:3}, at: sdei_cpuhp_up+0x3c/0x130 irq event stamp: 36 hardirqs last enabled at (35): [<ffffda301e85b7bc>] finish_task_switch+0xb4/0x2b0 hardirqs last disabled at (36): [<ffffda301e812fec>] cpuhp_thread_fun+0x21c/0x248 softirqs last enabled at (0): [<ffffda301e80b184>] copy_process+0x63c/0x1ac0 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 0 PID: 24 Comm: cpuhp/0 Not tainted 5.19.0-rc3-rt5-[...] Hardware name: WIWYNN Mt.Jade Server [...] Call trace: dump_backtrace+0x114/0x120 show_stack+0x20/0x70 dump_stack_lvl+0x9c/0xd8 dump_stack+0x18/0x34 __might_resched+0x188/0x228 rt_spin_lock+0x70/0x120 sdei_cpuhp_up+0x3c/0x130 cpuhp_invoke_callback+0x250/0xf08 cpuhp_thread_fun+0x120/0x248 smpboot_thread_fn+0x280/0x320 kthread+0x130/0x140 ret_from_fork+0x10/0x20 sdei_cpuhp_up() is called in the STARTING hotplug section, which runs with interrupts disabled. Use a CPUHP_AP_ONLINE_DYN entry instead to execute the cpuhp cb later, with preemption enabled. SDEI originally got its own cpuhp slot to allow interacting with perf. It got superseded by pNMI and this early slot is not relevant anymore. [1] Some SDEI calls (e.g. SDEI_1_0_FN_SDEI_PE_MASK) take actions on the calling CPU. It is checked that preemption is disabled for them. _ONLINE cpuhp cb are executed in the 'per CPU hotplug thread'. Preemption is enabled in those threads, but their cpumask is limited to 1 CPU. Move 'WARN_ON_ONCE(preemptible())' statements so that SDEI cpuhp cb don't trigger them. Also add a check for the SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI call which acts on the calling CPU. [1]: https://lore.kernel.org/all/5813b8c5-ae3e-87fd-fccc-94c9cd08816d@arm.com/ Suggested-by: James Morse <james.morse@arm.com> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20230216084920.144064-1-pierre.gondois@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30net: add vlan_get_protocol_and_depth() helperEric Dumazet
[ Upstream commit 4063384ef762cc5946fc7a3f89879e76c6ec51e2 ] Before blamed commit, pskb_may_pull() was used instead of skb_header_pointer() in __vlan_get_protocol() and friends. Few callers depended on skb->head being populated with MAC header, syzbot caught one of them (skb_mac_gso_segment()) Add vlan_get_protocol_and_depth() to make the intent clearer and use it where sensible. This is a more generic fix than commit e9d3f80935b6 ("net/af_packet: make sure to pull mac header") which was dealing with a similar issue. kernel BUG at include/linux/skbuff.h:2655 ! invalid opcode: 0000 [#1] SMP KASAN CPU: 0 PID: 1441 Comm: syz-executor199 Not tainted 6.1.24-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023 RIP: 0010:__skb_pull include/linux/skbuff.h:2655 [inline] RIP: 0010:skb_mac_gso_segment+0x68f/0x6a0 net/core/gro.c:136 Code: fd 48 8b 5c 24 10 44 89 6b 70 48 c7 c7 c0 ae 0d 86 44 89 e6 e8 a1 91 d0 00 48 c7 c7 00 af 0d 86 48 89 de 31 d2 e8 d1 4a e9 ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 RSP: 0018:ffffc90001bd7520 EFLAGS: 00010286 RAX: ffffffff8469736a RBX: ffff88810f31dac0 RCX: ffff888115a18b00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90001bd75e8 R08: ffffffff84697183 R09: fffff5200037adf9 R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000012 R13: 000000000000fee5 R14: 0000000000005865 R15: 000000000000fed7 FS: 000055555633f300(0000) GS:ffff8881f6a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 0000000116fea000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> [<ffffffff847018dd>] __skb_gso_segment+0x32d/0x4c0 net/core/dev.c:3419 [<ffffffff8470398a>] skb_gso_segment include/linux/netdevice.h:4819 [inline] [<ffffffff8470398a>] validate_xmit_skb+0x3aa/0xee0 net/core/dev.c:3725 [<ffffffff84707042>] __dev_queue_xmit+0x1332/0x3300 net/core/dev.c:4313 [<ffffffff851a9ec7>] dev_queue_xmit+0x17/0x20 include/linux/netdevice.h:3029 [<ffffffff851b4a82>] packet_snd net/packet/af_packet.c:3111 [inline] [<ffffffff851b4a82>] packet_sendmsg+0x49d2/0x6470 net/packet/af_packet.c:3142 [<ffffffff84669a12>] sock_sendmsg_nosec net/socket.c:716 [inline] [<ffffffff84669a12>] sock_sendmsg net/socket.c:736 [inline] [<ffffffff84669a12>] __sys_sendto+0x472/0x5f0 net/socket.c:2139 [<ffffffff84669c75>] __do_sys_sendto net/socket.c:2151 [inline] [<ffffffff84669c75>] __se_sys_sendto net/socket.c:2147 [inline] [<ffffffff84669c75>] __x64_sys_sendto+0xe5/0x100 net/socket.c:2147 [<ffffffff8551d40f>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff8551d40f>] do_syscall_64+0x2f/0x50 arch/x86/entry/common.c:80 [<ffffffff85600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 469aceddfa3e ("vlan: consolidate VLAN parsing code and limit max parsing depth") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30linux/dim: Do nothing if no time delta between samplesRoy Novich
[ Upstream commit 162bd18eb55adf464a0fa2b4144b8d61c75ff7c2 ] Add return value for dim_calc_stats. This is an indication for the caller if curr_stats was assigned by the function. Avoid using curr_stats uninitialized over {rdma/net}_dim, when no time delta between samples. Coverity reported this potential use of an uninitialized variable. Fixes: 4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux") Fixes: cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing") Signed-off-by: Roy Novich <royno@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://lore.kernel.org/r/20230507135743.138993-1-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30driver core: add a helper to setup both the of_node and fwnode of a deviceIoana Ciornei
[ Upstream commit 43e76d463c09a0272b84775bcc727c1eb8b384b2 ] There are many places where both the fwnode_handle and the of_node of a device need to be populated. Add a function which does both so that we have consistency. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: a26cc2934331 ("drm/mipi-dsi: Set the fwnode for mipi_dsi_device") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17PM: domains: Restore comment indentation for generic_pm_domain.child_linksGeert Uytterhoeven
commit afb0367a80553e795e7ad055299096544454f3f6 upstream. The rename of generic_pm_domain.slave_links to generic_pm_domain.child_links accidentally dropped the TAB to align the member's comment. Re-add the lost TAB to restore indentation. Fixes: 8d87ae48ced2dffd ("PM: domains: Fix up terminology with parent/child") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> [ rjw: Minor subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17printk: declare printk_deferred_{enter,safe}() in include/linux/printk.hTetsuo Handa
commit 85e3e7fbbb720b9897fba9a99659e31cbd1c082e upstream. [This patch implements subset of original commit 85e3e7fbbb72 ("printk: remove NMI tracking") where commit 1007843a9190 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock") depends on, for commit 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") was backported to stable.] All NMI contexts are handled the same as the safe context: store the message and defer printing. There is no need to have special NMI context tracking for this. Using in_nmi() is enough. There are several parts of the kernel that are manually calling into the printk NMI context tracking in order to cause general printk deferred printing: arch/arm/kernel/smp.c arch/powerpc/kexec/crash.c kernel/trace/trace.c For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new function pair printk_deferred_enter/exit that explicitly achieves the same objective. For ftrace, remove the printk context manipulation completely. It was added in commit 03fc7f9c99c1 ("printk/nmi: Prevent deadlock when accessing the main log buffer in NMI"). The purpose was to enforce storing messages directly into the ring buffer even in NMI context. It really should have only modified the behavior in NMI context. There is no need for a special behavior any longer. All messages are always stored directly now. The console deferring is handled transparently in vprintk(). Signed-off-by: John Ogness <john.ogness@linutronix.de> [pmladek@suse.com: Remove special handling in ftrace.c completely. Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de [penguin-kernel: Copy only printk_deferred_{enter,safe}() definition ] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17tty: Prevent writing chars during tcsetattr TCSADRAIN/FLUSHIlpo Järvinen
If userspace races tcsetattr() with a write, the drained condition might not be guaranteed by the kernel. There is a race window after checking Tx is empty before tty_set_termios() takes termios_rwsem for write. During that race window, more characters can be queued by a racing writer. Any ongoing transmission might produce garbage during HW's ->set_termios() call. The intent of TCSADRAIN/FLUSH seems to be preventing such a character corruption. If those flags are set, take tty's write lock to stop any writer before performing the lower layer Tx empty check and wait for the pending characters to be sent (if any). The initial wait for all-writers-done must be placed outside of tty's write lock to avoid deadlock which makes it impossible to use tty_wait_until_sent(). The write lock is retried if a racing write is detected. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20230317113318.31327-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 094fb49a2d0d6827c86d2e0840873e6db0c491d2) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystemJoel Fernandes (Google)
[ Upstream commit 58d7668242647e661a20efe065519abd6454287e ] For CONFIG_NO_HZ_FULL systems, the tick_do_timer_cpu cannot be offlined. However, cpu_is_hotpluggable() still returns true for those CPUs. This causes torture tests that do offlining to end up trying to offline this CPU causing test failures. Such failure happens on all architectures. Fix the repeated error messages thrown by this (even if the hotplug errors are harmless) by asking the opinion of the nohz subsystem on whether the CPU can be hotplugged. [ Apply Frederic Weisbecker feedback on refactoring tick_nohz_cpu_down(). ] For drivers/base/ portion: Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Zhouyi Zhou <zhouzhouyi@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: rcu <rcu@vger.kernel.org> Cc: stable@vger.kernel.org Fixes: 2987557f52b9 ("driver-core/cpu: Expose hotpluggability to the rest of the kernel") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17nohz: Add TICK_DEP_BIT_RCUFrederic Weisbecker
[ Upstream commit 01b4c39901e087ceebae2733857248de81476bd8 ] If a nohz_full CPU is looping in the kernel, the scheduling-clock tick might nevertheless remain disabled. In !PREEMPT kernels, this can prevent RCU's attempts to enlist the aid of that CPU's executions of cond_resched(), which can in turn result in an arbitrarily delayed grace period and thus an OOM. RCU therefore needs a way to enable a holdout nohz_full CPU's scheduler-clock interrupt. This commit therefore provides a new TICK_DEP_BIT_RCU value which RCU can pass to tick_dep_set_cpu() and friends to force on the scheduler-clock interrupt for a specified CPU or task. In some cases, rcutorture needs to turn on the scheduler-clock tick, so this commit also exports the relevant symbols to GPL-licensed modules. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Stable-dep-of: 58d766824264 ("tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17mailbox: zynqmp: Fix typo in IPI documentationTanmay Shah
commit 79963fbfc233759bd8a43462f120d15a1bd4f4fa upstream. Xilinx IPI message buffers allows 32-byte data transfer. Fix documentation that says 12 bytes Fixes: 4981b82ba2ff ("mailbox: ZynqMP IPI mailbox controller") Signed-off-by: Tanmay Shah <tanmay.shah@amd.com> Acked-by: Michal Simek <michal.simek@amd.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230311012407.1292118-4-tanmay.shah@amd.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17SUNRPC: remove the maximum number of retries in call_bind_statusDai Ngo
[ Upstream commit 691d0b782066a6eeeecbfceb7910a8f6184e6105 ] Currently call_bind_status places a hard limit of 3 to the number of retries on EACCES error. This limit was done to prevent NLM unlock requests from being hang forever when the server keeps returning garbage. However this change causes problem for cases when NLM service takes longer than 9 seconds to register with the port mapper after a restart. This patch removes this hard coded limit and let the RPC handles the retry based on the standard hard/soft task semantics. Fixes: 0b760113a3a1 ("NLM: Don't hang forever on NLM unlock requests") Reported-by: Helen Chao <helen.chao@oracle.com> Tested-by: Helen Chao <helen.chao@oracle.com> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17PM: domains: Fix up terminology with parent/childKees Cook
[ Upstream commit 8d87ae48ced2dffd5e7247d19eb4c88be6f1c6f1 ] The genpd infrastructure uses the terms master/slave, but such uses have no external exposures (not even in Documentation/driver-api/pm/*) and are not mandated by nor associated with any external specifications. Change the language used through-out to parent/child. There was one possible exception in the debugfs node "pm_genpd/pm_genpd_summary" but its path has no hits outside of the kernel itself when performing a code search[1], and it seems even this single usage has been non-functional since it was introduced due to a typo in the Python ("apend" instead of correct "append"). Fix the typo while we're at it. Link: https://codesearch.debian.net/ # [1] Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: f19c3c2959e4 ("scripts/gdb: bail early if there are no generic PD") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17linux/vt_buffer.h: allow either builtin or modular for macrosRandy Dunlap
[ Upstream commit 2b76ffe81e32afd6d318dc4547e2ba8c46207b77 ] Fix build errors on ARCH=alpha when CONFIG_MDA_CONSOLE=m. This allows the ARCH macros to be the only ones defined. In file included from ../drivers/video/console/mdacon.c:37: ../arch/alpha/include/asm/vga.h:17:40: error: expected identifier or '(' before 'volatile' 17 | static inline void scr_writew(u16 val, volatile u16 *addr) | ^~~~~~~~ ../include/linux/vt_buffer.h:24:34: note: in definition of macro 'scr_writew' 24 | #define scr_writew(val, addr) (*(addr) = (val)) | ^~~~ ../include/linux/vt_buffer.h:24:40: error: expected ')' before '=' token 24 | #define scr_writew(val, addr) (*(addr) = (val)) | ^ ../arch/alpha/include/asm/vga.h:17:20: note: in expansion of macro 'scr_writew' 17 | static inline void scr_writew(u16 val, volatile u16 *addr) | ^~~~~~~~~~ ../arch/alpha/include/asm/vga.h:25:29: error: expected identifier or '(' before 'volatile' 25 | static inline u16 scr_readw(volatile const u16 *addr) | ^~~~~~~~ Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: dri-devel@lists.freedesktop.org Cc: linux-fbdev@vger.kernel.org Link: https://lore.kernel.org/r/20230329021529.16188-1-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17genirq: Add IRQF_NO_AUTOEN for request_irq/nmi()Barry Song
[ Upstream commit cbe16f35bee6880becca6f20d2ebf6b457148552 ] Many drivers don't want interrupts enabled automatically via request_irq(). So they are handling this issue by either way of the below two: (1) irq_set_status_flags(irq, IRQ_NOAUTOEN); request_irq(dev, irq...); (2) request_irq(dev, irq...); disable_irq(irq); The code in the second way is silly and unsafe. In the small time gap between request_irq() and disable_irq(), interrupts can still come. The code in the first way is safe though it's subobtimal. Add a new IRQF_NO_AUTOEN flag which can be handed in by drivers to request_irq() and request_nmi(). It prevents the automatic enabling of the requested interrupt/nmi in the same safe way as #1 above. With that the various usage sites of #1 and #2 above can be simplified and corrected. Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: dmitry.torokhov@gmail.com Link: https://lore.kernel.org/r/20210302224916.13980-2-song.bao.hua@hisilicon.com Stable-dep-of: 39db65a0a17b ("ASoC: es8316: Handle optional IRQ assignment") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17netfilter: nf_tables: don't write table validation state without mutexFlorian Westphal
[ Upstream commit 9a32e9850686599ed194ccdceb6cd3dd56b2d9b9 ] The ->cleanup callback needs to be removed, this doesn't work anymore as the transaction mutex is already released in the ->abort function. Just do it after a successful validation pass, this either happens from commit or abort phases where transaction mutex is held. Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17nvme: handle the persistent internal error AERMichael Kelley
[ Upstream commit 2c61c97fb12b806e1c8eb15f04c277ad097ec95e ] In the NVM Express Revision 1.4 spec, Figure 145 describes possible values for an AER with event type "Error" (value 000b). For a Persistent Internal Error (value 03h), the host should perform a controller reset. Add support for this error using code that already exists for doing a controller reset. As part of this support, introduce two utility functions for parsing the AER type and subtype. This new support was tested in a lab environment where we can generate the persistent internal error on demand, and observe both the Linux side and NVMe controller side to see that the controller reset has been done. Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: 6622b76fe922 ("nvme: fix async event trace event") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17debugfs: regset32: Add Runtime PM supportGeert Uytterhoeven
commit 30332eeefec8f83afcea00c360f99ef64b87f220 upstream. Hardware registers of devices under control of power management cannot be accessed at all times. If such a device is suspended, register accesses may lead to undefined behavior, like reading bogus values, or causing exceptions or system lock-ups. Extend struct debugfs_regset32 with an optional field to let device drivers specify the device the registers in the set belong to. This allows debugfs_show_regset32() to make sure the device is resumed while its registers are being read. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26netfilter: nf_tables: fix ifdef to also consider nf_tables=mFlorian Westphal
[ Upstream commit c55c0e91c813589dc55bea6bf9a9fbfaa10ae41d ] nftables can be built as a module, so fix the preprocessor conditional accordingly. Fixes: 478b360a47b7 ("netfilter: nf_tables: fix nf_trace always-on with XT_TRACE=n") Reported-by: Florian Fainelli <f.fainelli@gmail.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-26netfilter: br_netfilter: fix recent physdev match breakageFlorian Westphal
[ Upstream commit 94623f579ce338b5fa61b5acaa5beb8aa657fb9e ] Recent attempt to ensure PREROUTING hook is executed again when a decrypted ipsec packet received on a bridge passes through the network stack a second time broke the physdev match in INPUT hook. We can't discard the nf_bridge info strct from sabotage_in hook, as this is needed by the physdev match. Keep the struct around and handle this with another conditional instead. Fixes: 2b272bb558f1 ("netfilter: br_netfilter: disable sabotage_in hook after first suppression") Reported-and-tested-by: Farid BENAMROUCHE <fariouche@yahoo.fr> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-20ftrace: Mark get_lock_parent_ip() __always_inlineJohn Keeping
commit ea65b41807a26495ff2a73dd8b1bab2751940887 upstream. If the compiler decides not to inline this function then preemption tracing will always show an IP inside the preemption disabling path and never the function actually calling preempt_{enable,disable}. Link: https://lore.kernel.org/linux-trace-kernel/20230327173647.1690849-1-john@metanate.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: stable@vger.kernel.org Fixes: f904f58263e1d ("sched/debug: Fix preempt_disable_ip recording for preempt_disable()") Signed-off-by: John Keeping <john@metanate.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()Kees Cook
[ Upstream commit b13fecb1c3a603c4b8e99b306fecf4f668c11b32 ] This converts all the existing DECLARE_TASKLET() (and ...DISABLED) macros with DECLARE_TASKLET_OLD() in preparation for refactoring the tasklet callback type. All existing DECLARE_TASKLET() users had a "0" data argument, it has been removed here as well. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Stable-dep-of: 1fdeb8b9f29d ("wifi: iwl3945: Add missing check for create_singlethread_workqueue") Signed-off-by: Sasha Levin <sashal@kernel.org> [Tom: fix backport to 5.4.y] AUTOSEL backport to 5.4.y of: b13fecb1c3a6 ("treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()") changed all locations of DECLARE_TASKLET with DECLARE_TASKLET_OLD, except one, in arch/mips/lasat/pcivue_proc.c. This is due to: 10760dde9be3 ("MIPS: Remove support for LASAT") preceeding b13fecb1c3a6 ("treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()") upstream and the former not being present in 5.4.y. Fix this by changing DECLARE_TASKLET to DECLARE_TASKLET_OLD in arch/mips/lasat/pcivue_proc.c. Fixes: 5de7a4254eb2 ("treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()") Reported-by: "kernelci.org bot" <bot@kernelci.org> Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20Revert "treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()"Tom Saeger
This reverts commit 5de7a4254eb2d501cbb59918a152665b29c02109 which caused mips build failures. kernelci.org bot reports: arch/mips/lasat/picvue_proc.c:87:20: error: ‘pvc_display_tasklet’ undeclared (first use in this function) arch/mips/lasat/picvue_proc.c:42:44: error: expected ‘)’ before ‘&’ token arch/mips/lasat/picvue_proc.c:33:13: error: ‘pvc_display’ defined but not used [-Werror=unused-function] Link: https://lore.kernel.org/stable/64041dda.170a0220.8cc25.79c9@mx.google.com/ Reported-by: "kernelci.org bot" <bot@kernelci.org> Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-05net_sched: add __rcu annotation to netdev->qdiscEric Dumazet
commit 5891cd5ec46c2c2eb6427cb54d214b149635dd0e upstream. syzbot found a data-race [1] which lead me to add __rcu annotations to netdev->qdisc, and proper accessors to get LOCKDEP support. [1] BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1: attach_default_qdiscs net/sched/sch_generic.c:1167 [inline] dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221 __dev_open+0x2e9/0x3a0 net/core/dev.c:1416 __dev_change_flags+0x167/0x3f0 net/core/dev.c:8139 rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150 __rtnl_newlink net/core/rtnetlink.c:3489 [inline] rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmsg+0x195/0x230 net/socket.c:2496 __do_sys_sendmsg net/socket.c:2505 [inline] __se_sys_sendmsg net/socket.c:2503 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0: qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323 __tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050 tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211 rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmsg+0x195/0x230 net/socket.c:2496 __do_sys_sendmsg net/socket.c:2505 [inline] __se_sys_sendmsg net/socket.c:2503 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e1383-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 470502de5bdb ("net: sched: unlock rules update API") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Buslov <vladbu@mellanox.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zubin Mithra <zsm@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-05dma-mapping: drop the dev argument to arch_sync_dma_for_*Christoph Hellwig
[ Upstream commit 56e35f9c5b87ec1ae93e483284e189c84388de16 ] These are pure cache maintainance routines, so drop the unused struct device argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Stable-dep-of: ab327f8acdf8 ("mips: bmips: BCM6358: disable RAC flush for TP1") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-05nvme-tcp: fix nvme_tcp_term_pdu to match specCaleb Sander
[ Upstream commit aa01c67de5926fdb276793180564f172c55fb0d7 ] The FEI field of C2HTermReq/H2CTermReq is 4 bytes but not 4-byte-aligned in the NVMe/TCP specification (it is located at offset 10 in the PDU). Split it into two 16-bit integers in struct nvme_tcp_term_pdu so no padding is inserted. There should also be 10 reserved bytes after. There are currently no users of this type. Fixes: fc221d05447aa6db ("nvme-tcp: Add protocol header") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Caleb Sander <csander@purestorage.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-22HID: core: Provide new max_buffer_size attribute to over-ride the defaultLee Jones
commit b1a37ed00d7908a991c1d0f18a8cba3c2aa99bdc upstream. Presently, when a report is processed, its proposed size, provided by the user of the API (as Report Size * Report Count) is compared against the subsystem default HID_MAX_BUFFER_SIZE (16k). However, some low-level HID drivers allocate a reduced amount of memory to their buffers (e.g. UHID only allocates UHID_DATA_MAX (4k) buffers), rending this check inadequate in some cases. In these circumstances, if the received report ends up being smaller than the proposed report size, the remainder of the buffer is zeroed. That is, the space between sizeof(csize) (size of the current report) and the rsize (size proposed i.e. Report Size * Report Count), which can be handled up to HID_MAX_BUFFER_SIZE (16k). Meaning that memset() shoots straight past the end of the buffer boundary and starts zeroing out in-use values, often resulting in calamity. This patch introduces a new variable into 'struct hid_ll_driver' where individual low-level drivers can over-ride the default maximum value of HID_MAX_BUFFER_SIZE (16k) with something more sympathetic to the interface. Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> [Lee: Backported to v5.10.y] Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-22tracing: Make tracepoint lockdep check actually test somethingSteven Rostedt (Google)
commit c2679254b9c9980d9045f0f722cf093a2b1f7590 upstream. A while ago where the trace events had the following: rcu_read_lock_sched_notrace(); rcu_dereference_sched(...); rcu_read_unlock_sched_notrace(); If the tracepoint is enabled, it could trigger RCU issues if called in the wrong place. And this warning was only triggered if lockdep was enabled. If the tracepoint was never enabled with lockdep, the bug would not be caught. To handle this, the above sequence was done when lockdep was enabled regardless if the tracepoint was enabled or not (although the always enabled code really didn't do anything, it would still trigger a warning). But a lot has changed since that lockdep code was added. One is, that sequence no longer triggers any warning. Another is, the tracepoint when enabled doesn't even do that sequence anymore. The main check we care about today is whether RCU is "watching" or not. So if lockdep is enabled, always check if rcu_is_watching() which will trigger a warning if it is not (tracepoints require RCU to be watching). Note, that old sequence did add a bit of overhead when lockdep was enabled, and with the latest kernel updates, would cause the system to slow down enough to trigger kernel "stalled" warnings. Link: http://lore.kernel.org/lkml/20140806181801.GA4605@redhat.com Link: http://lore.kernel.org/lkml/20140807175204.C257CAC5@viggo.jf.intel.com Link: https://lore.kernel.org/lkml/20230307184645.521db5c9@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20230310172856.77406446@gandalf.local.home Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Joel Fernandes <joel@joelfernandes.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Fixes: e6753f23d961 ("tracepoint: Make rcuidle tracepoint callers use SRCU") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-22sh: intc: Avoid spurious sizeof-pointer-div warningMichael Karcher
[ Upstream commit 250870824c1cf199b032b1ef889c8e8d69d9123a ] GCC warns about the pattern sizeof(void*)/sizeof(void), as it looks like the abuse of a pattern to calculate the array size. This pattern appears in the unevaluated part of the ternary operator in _INTC_ARRAY if the parameter is NULL. The replacement uses an alternate approach to return 0 in case of NULL which does not generate the pattern sizeof(void*)/sizeof(void), but still emits the warning if _INTC_ARRAY is called with a nonarray parameter. This patch is required for successful compilation with -Werror enabled. The idea to use _Generic for type distinction is taken from Comment #7 in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108483 by Jakub Jelinek Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/619fa552-c988-35e5-b1d7-fe256c46a272@mkarcher.dialup.fu-berlin.de Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-22net: tunnels: annotate lockless accesses to dev->needed_headroomEric Dumazet
[ Upstream commit 4b397c06cb987935b1b097336532aa6b4210e091 ] IP tunnels can apparently update dev->needed_headroom in their xmit path. This patch takes care of three tunnels xmit, and also the core LL_RESERVED_SPACE() and LL_RESERVED_SPACE_EXTRA() helpers. More changes might be needed for completeness. BUG: KCSAN: data-race in ip_tunnel_xmit / ip_tunnel_xmit read to 0xffff88815b9da0ec of 2 bytes by task 888 on cpu 1: ip_tunnel_xmit+0x1270/0x1730 net/ipv4/ip_tunnel.c:803 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228 ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:444 [inline] ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126 iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82 ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 write to 0xffff88815b9da0ec of 2 bytes by task 2379 on cpu 0: ip_tunnel_xmit+0x1294/0x1730 net/ipv4/ip_tunnel.c:804 __gre_xmit net/ipv4/ip_gre.c:469 [inline] ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661 __netdev_start_xmit include/linux/netdevice.h:4881 [inline] netdev_start_xmit include/linux/netdevice.h:4895 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596 __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246 dev_queue_xmit include/linux/netdevice.h:3051 [inline] neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623 neigh_output include/net/neighbour.h:546 [inline] ip6_finish_output2+0x9bc/0xc50 net/ipv6/ip6_output.c:134 __ip6_finish_output net/ipv6/ip6_output.c:195 [inline] ip6_finish_output+0x39a/0x4e0 net/ipv6/ip6_output.c:206 NF_HOOK_COND include/linux/netfilter.h:291 [inline] ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:227 dst_output include/net/dst.h:444 [inline] NF_HOOK include/linux/netfilter.h:302 [inline] mld_sendpack+0x438/0x6a0 net/ipv6/mcast.c:1820 mld_send_cr net/ipv6/mcast.c:2121 [inline] mld_ifc_work+0x519/0x7b0 net/ipv6/mcast.c:2653 process_one_work+0x3e6/0x750 kernel/workqueue.c:2390 worker_thread+0x5f2/0xa10 kernel/workqueue.c:2537 kthread+0x1ac/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 value changed: 0x0dd4 -> 0x0e14 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 2379 Comm: kworker/0:0 Not tainted 6.3.0-rc1-syzkaller-00002-g8ca09d5fa354-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 Workqueue: mld mld_ifc_work Fixes: 8eb30be0352d ("ipv6: Create ip6_tnl_xmit") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230310191109.2384387-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17PCI: Add SolidRun vendor IDAlvaro Karsz
[ Upstream commit db6c4dee4c104f50ed163af71c53bfdb878a8318 ] Add SolidRun vendor ID to pci_ids.h The vendor ID is used in 2 different source files, the SNET vDPA driver and PCI quirks. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Message-Id: <20230110165638.123745-2-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17irqdomain: Change the type of 'size' in __irq_domain_add() to be consistentBixuan Cui
[ Upstream commit 20c36ce2164f1774b487d443ece99b754bc6ad43 ] The 'size' is used in struct_size(domain, revmap, size) and its input parameter type is 'size_t'(unsigned int). Changing the size to 'unsigned int' to make the type consistent. Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210916025203.44841-1-cuibixuan@huawei.com Stable-dep-of: 8932c32c3053 ("irqdomain: Fix domain registration race") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11PCI: Add ACS quirk for Wangxun NICsMengyuan Lou
[ Upstream commit a2b9b123ccac913e9f9b80337d687a2fe786a634 ] Wangxun has verified there is no peer-to-peer between functions for the below selection of SFxxx, RP1000 and RP2000 NICS. They may be multi-function devices, but the hardware does not advertise ACS capability. Add an ACS quirk for these devices so the functions can be in independent IOMMU groups. Link: https://lore.kernel.org/r/20230207102419.44326-1-mengyuanlou@net-swift.com Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-11ima: Align ima_file_mmap() parameters with mmap_file LSM hookRoberto Sassu
commit 4971c268b85e1c7a734a61622fc0813c86e2362e upstream. Commit 98de59bfe4b2f ("take calculation of final prot in security_mmap_file() into a helper") moved the code to update prot, to be the actual protections applied to the kernel, to a new helper called mmap_prot(). However, while without the helper ima_file_mmap() was getting the updated prot, with the helper ima_file_mmap() gets the original prot, which contains the protections requested by the application. A possible consequence of this change is that, if an application calls mmap() with only PROT_READ, and the kernel applies PROT_EXEC in addition, that application would have access to executable memory without having this event recorded in the IMA measurement list. This situation would occur for example if the application, before mmap(), calls the personality() system call with READ_IMPLIES_EXEC as the first argument. Align ima_file_mmap() parameters with those of the mmap_file LSM hook, so that IMA can receive both the requested prot and the final prot. Since the requested protections are stored in a new variable, and the final protections are stored in the existing variable, this effectively restores the original behavior of the MMAP_CHECK hook. Cc: stable@vger.kernel.org Fixes: 98de59bfe4b2 ("take calculation of final prot in security_mmap_file() into a helper") Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe rangeYang Jihong
commit f1c97a1b4ef709e3f066f82e3ba3108c3b133ae6 upstream. When arch_prepare_optimized_kprobe calculating jump destination address, it copies original instructions from jmp-optimized kprobe (see __recover_optprobed_insn), and calculated based on length of original instruction. arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when checking whether jmp-optimized kprobe exists. As a result, setup_detour_execution may jump to a range that has been overwritten by jump destination address, resulting in an inval opcode error. For example, assume that register two kprobes whose addresses are <func+9> and <func+11> in "func" function. The original code of "func" function is as follows: 0xffffffff816cb5e9 <+9>: push %r12 0xffffffff816cb5eb <+11>: xor %r12d,%r12d 0xffffffff816cb5ee <+14>: test %rdi,%rdi 0xffffffff816cb5f1 <+17>: setne %r12b 0xffffffff816cb5f5 <+21>: push %rbp 1.Register the kprobe for <func+11>, assume that is kp1, corresponding optimized_kprobe is op1. After the optimization, "func" code changes to: 0xffffffff816cc079 <+9>: push %r12 0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000 0xffffffff816cc080 <+16>: incl 0xf(%rcx) 0xffffffff816cc083 <+19>: xchg %eax,%ebp 0xffffffff816cc084 <+20>: (bad) 0xffffffff816cc085 <+21>: push %rbp Now op1->flags == KPROBE_FLAG_OPTIMATED; 2. Register the kprobe for <func+9>, assume that is kp2, corresponding optimized_kprobe is op2. register_kprobe(kp2) register_aggr_kprobe alloc_aggr_kprobe __prepare_optimized_kprobe arch_prepare_optimized_kprobe __recover_optprobed_insn // copy original bytes from kp1->optinsn.copied_insn, // jump address = <func+14> 3. disable kp1: disable_kprobe(kp1) __disable_kprobe ... if (p == orig_p || aggr_kprobe_disabled(orig_p)) { ret = disarm_kprobe(orig_p, true) // add op1 in unoptimizing_list, not unoptimized orig_p->flags |= KPROBE_FLAG_DISABLED; // op1->flags == KPROBE_FLAG_OPTIMATED | KPROBE_FLAG_DISABLED ... 4. unregister kp2 __unregister_kprobe_top ... if (!kprobe_disabled(ap) && !kprobes_all_disarmed) { optimize_kprobe(op) ... if (arch_check_optimized_kprobe(op) < 0) // because op1 has KPROBE_FLAG_DISABLED, here not return return; p->kp.flags |= KPROBE_FLAG_OPTIMIZED; // now op2 has KPROBE_FLAG_OPTIMIZED } "func" code now is: 0xffffffff816cc079 <+9>: int3 0xffffffff816cc07a <+10>: push %rsp 0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000 0xffffffff816cc080 <+16>: incl 0xf(%rcx) 0xffffffff816cc083 <+19>: xchg %eax,%ebp 0xffffffff816cc084 <+20>: (bad) 0xffffffff816cc085 <+21>: push %rbp 5. if call "func", int3 handler call setup_detour_execution: if (p->flags & KPROBE_FLAG_OPTIMIZED) { ... regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX; ... } The code for the destination address is 0xffffffffa021072c: push %r12 0xffffffffa021072e: xor %r12d,%r12d 0xffffffffa0210731: jmp 0xffffffff816cb5ee <func+14> However, <func+14> is not a valid start instruction address. As a result, an error occurs. Link: https://lore.kernel.org/all/20230216034247.32348-3-yangjihong1@huawei.com/ Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11x86/kprobes: Fix __recover_optprobed_insn check optimizing logicYang Jihong
commit 868a6fc0ca2407622d2833adefe1c4d284766c4c upstream. Since the following commit: commit f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") modified the update timing of the KPROBE_FLAG_OPTIMIZED, a optimized_kprobe may be in the optimizing or unoptimizing state when op.kp->flags has KPROBE_FLAG_OPTIMIZED and op->list is not empty. The __recover_optprobed_insn check logic is incorrect, a kprobe in the unoptimizing state may be incorrectly determined as unoptimizing. As a result, incorrect instructions are copied. The optprobe_queued_unopt function needs to be exported for invoking in arch directory. Link: https://lore.kernel.org/all/20230216034247.32348-2-yangjihong1@huawei.com/ Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") Cc: stable@vger.kernel.org Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-11uaccess: Add minimum bounds check on kernel buffer sizeKees Cook
[ Upstream commit 04ffde1319a715bd0550ded3580d4ea3bc003776 ] While there is logic about the difference between ksize and usize, copy_struct_from_user() didn't check the size of the destination buffer (when it was known) against ksize. Add this check so there is an upper bounds check on the possible memset() call, otherwise lower bounds checks made by callers will trigger bounds warnings under -Warray-bounds. Seen under GCC 13: In function 'copy_struct_from_user', inlined from 'iommufd_fops_ioctl' at ../drivers/iommu/iommufd/main.c:333:8: ../include/linux/fortify-string.h:59:33: warning: '__builtin_memset' offset [57, 4294967294] is out of the bounds [0, 56] of object 'buf' with type 'union ucmd_buffer' [-Warray-bounds=] 59 | #define __underlying_memset __builtin_memset | ^ ../include/linux/fortify-string.h:453:9: note: in expansion of macro '__underlying_memset' 453 | __underlying_memset(p, c, __fortify_size); \ | ^~~~~~~~~~~~~~~~~~~ ../include/linux/fortify-string.h:461:25: note: in expansion of macro '__fortify_memset_chk' 461 | #define memset(p, c, s) __fortify_memset_chk(p, c, s, \ | ^~~~~~~~~~~~~~~~~~~~ ../include/linux/uaccess.h:334:17: note: in expansion of macro 'memset' 334 | memset(dst + size, 0, rest); | ^~~~~~ ../drivers/iommu/iommufd/main.c: In function 'iommufd_fops_ioctl': ../drivers/iommu/iommufd/main.c:311:27: note: 'buf' declared here 311 | union ucmd_buffer buf; | ^~~ Cc: Christian Brauner <brauner@kernel.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Alexander Potapenko <glider@google.com> Acked-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/lkml/20230203193523.never.667-kees@kernel.org/ Signed-off-by: Sasha Levin <sashal@kernel.org>