summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2023-11-20netfilter: nft_redir: use `struct nf_nat_range2` throughout and deduplicate ↵Jeremy Sowden
eval call-backs [ Upstream commit 6f56ad1b92328997e1b1792047099df6f8d7acb5 ] `nf_nat_redirect_ipv4` takes a `struct nf_nat_ipv4_multi_range_compat`, but converts it internally to a `struct nf_nat_range2`. Change the function to take the latter, factor out the code now shared with `nf_nat_redirect_ipv6`, move the conversion to the xt_REDIRECT module, and update the ipv4 range initialization in the nft_redir module. Replace a bare hex constant for 127.0.0.1 with a macro. Remove `WARN_ON`. `nf_nat_setup_info` calls `nf_ct_is_confirmed`: /* Can't setup nat info for confirmed ct. */ if (nf_ct_is_confirmed(ct)) return NF_ACCEPT; This means that `ct` cannot be null or the kernel will crash, and implies that `ctinfo` is `IP_CT_NEW` or `IP_CT_RELATED`. nft_redir has separate ipv4 and ipv6 call-backs which share much of their code, and an inet one switch containing a switch that calls one of the others based on the family of the packet. Merge the ipv4 and ipv6 ones into the inet one in order to get rid of the duplicate code. Const-qualify the `priv` pointer since we don't need to write through it. Assign `priv->flags` to the range instead of OR-ing it in. Set the `NF_NAT_RANGE_PROTO_SPECIFIED` flag once during init, rather than on every eval. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de> Stable-dep-of: 80abbe8a8263 ("netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20Fix termination state for idr_for_each_entry_ul()NeilBrown
[ Upstream commit e8ae8ad479e2d037daa33756e5e72850a7bd37a9 ] The comment for idr_for_each_entry_ul() states after normal termination @entry is left with the value NULL This is not correct in the case where UINT_MAX has an entry in the idr. In that case @entry will be non-NULL after termination. No current code depends on the documentation being correct, but to save future code we should fix it. Also fix idr_for_each_entry_continue_ul(). While this is not documented as leaving @entry as NULL, the mellanox driver appears to depend on it doing so. So make that explicit in the documentation as well as in the code. Fixes: e33d2b74d805 ("idr: fix overflow case for idr_for_each_entry_ul()") Cc: Matthew Wilcox <willy@infradead.org> Cc: Chris Mi <chrism@mellanox.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20inet: shrink struct flowi_commonEric Dumazet
[ Upstream commit 1726483b79a72e0150734d5367e4a0238bf8fcff ] I am looking at syzbot reports triggering kernel stack overflows involving a cascade of ipvlan devices. We can save 8 bytes in struct flowi_common. This patch alone will not fix the issue, but is a start. Fixes: 24ba14406c5c ("route: Add multipath_hash in flowi_common to make user-define hash") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: wenxu <wenxu@ucloud.cn> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231025141037.3448203-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20mfd: core: Un-constify mfd_cell.of_regMichał Mirosław
[ Upstream commit 3c70342f1f0045dc827bb2f02d814ce31e0e0d05 ] Enable dynamically filling in the whole mfd_cell structure. All other fields already allow that. Fixes: 466a62d7642f ("mfd: core: Make a best effort attempt to match devices with the correct of_nodes") Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Link: https://lore.kernel.org/r/b73fe4bc4bd6ba1af90940a640ed65fe254c0408.1693253717.git.mirq-linux@rere.qmqm.pl Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20padata: Convert from atomic_t to refcount_t on parallel_data->refcntXiyu Yang
[ Upstream commit d5ee8e750c9449e9849a09ce6fb6b8adeaa66adc ] refcount_t type and corresponding API can protect refcounters from accidental underflow and overflow and further use-after-free situations. Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Stable-dep-of: 7ddc21e317b3 ("padata: Fix refcnt handling in padata_free_shell()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20arm64/arm: xen: enlighten: Fix KPTI checksMark Rutland
[ Upstream commit 20f3b8eafe0ba5d3c69d5011a9b07739e9645132 ] When KPTI is in use, we cannot register a runstate region as XEN requires that this is always a valid VA, which we cannot guarantee. Due to this, xen_starting_cpu() must avoid registering each CPU's runstate region, and xen_guest_init() must avoid setting up features that depend upon it. We tried to ensure that in commit: f88af7229f6f22ce (" xen/arm: do not setup the runstate info page if kpti is enabled") ... where we added checks for xen_kernel_unmapped_at_usr(), which wraps arm64_kernel_unmapped_at_el0() on arm64 and is always false on 32-bit arm. Unfortunately, as xen_guest_init() is an early_initcall, this happens before secondary CPUs are booted and arm64 has finalized the ARM64_UNMAP_KERNEL_AT_EL0 cpucap which backs arm64_kernel_unmapped_at_el0(), and so this can subsequently be set as secondary CPUs are onlined. On a big.LITTLE system where the boot CPU does not require KPTI but some secondary CPUs do, this will result in xen_guest_init() intializing features that depend on the runstate region, and xen_starting_cpu() registering the runstate region on some CPUs before KPTI is subsequent enabled, resulting the the problems the aforementioned commit tried to avoid. Handle this more robsutly by deferring the initialization of the runstate region until secondary CPUs have been initialized and the ARM64_UNMAP_KERNEL_AT_EL0 cpucap has been finalized. The per-cpu work is moved into a new hotplug starting function which is registered later when we're certain that KPTI will not be used. Fixes: f88af7229f6f ("xen/arm: do not setup the runstate info page if kpti is enabled") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Bertrand Marquis <bertrand.marquis@arm.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20clk: linux/clk-provider.h: fix kernel-doc warnings and typosRandy Dunlap
[ Upstream commit 84aefafe6b294041b7fa0757414c4a29c1bdeea2 ] Fix spelling of "Structure". Fix multiple kernel-doc warnings: clk-provider.h:269: warning: Function parameter or member 'recalc_rate' not described in 'clk_ops' clk-provider.h:468: warning: Function parameter or member 'parent_data' not described in 'clk_hw_register_fixed_rate_with_accuracy_parent_data' clk-provider.h:468: warning: Excess function parameter 'parent_name' description in 'clk_hw_register_fixed_rate_with_accuracy_parent_data' clk-provider.h:482: warning: Function parameter or member 'parent_data' not described in 'clk_hw_register_fixed_rate_parent_accuracy' clk-provider.h:482: warning: Excess function parameter 'parent_name' description in 'clk_hw_register_fixed_rate_parent_accuracy' clk-provider.h:687: warning: Function parameter or member 'flags' not described in 'clk_divider' clk-provider.h:1164: warning: Function parameter or member 'flags' not described in 'clk_fractional_divider' clk-provider.h:1164: warning: Function parameter or member 'approximation' not described in 'clk_fractional_divider' clk-provider.h:1213: warning: Function parameter or member 'flags' not described in 'clk_multiplier' Fixes: 9fba738a53dd ("clk: add duty cycle support") Fixes: b2476490ef11 ("clk: introduce the common clock framework") Fixes: 2d34f09e79c9 ("clk: fixed-rate: Add support for specifying parents via DT/pointers") Fixes: f5290d8e4f0c ("clk: asm9260: use parent index to link the reference clock") Fixes: 9d9f78ed9af0 ("clk: basic clock hardware types") Fixes: e2d0e90fae82 ("clk: new basic clk type for fractional divider") Fixes: f2e0a53271a4 ("clk: Add a basic multiplier clock") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Michael Turquette <mturquette@baylibre.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: linux-clk@vger.kernel.org Link: https://lore.kernel.org/r/20230930221428.18463-1-rdunlap@infradead.org Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20clk: asm9260: use parent index to link the reference clockDmitry Baryshkov
[ Upstream commit f5290d8e4f0caa81a491448a27dd70e726095d07 ] Rewrite clk-asm9260 to use parent index to use the reference clock. During this rework two helpers are added: - clk_hw_register_mux_table_parent_data() to supplement clk_hw_register_mux_table() but using parent_data instead of parent_names - clk_hw_register_fixed_rate_parent_accuracy() to be used instead of directly calling __clk_hw_register_fixed_rate(). The later function is an internal API, which is better not to be called directly. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20220916061740.87167-2-dmitry.baryshkov@linaro.org Signed-off-by: Stephen Boyd <sboyd@kernel.org> Stable-dep-of: 84aefafe6b29 ("clk: linux/clk-provider.h: fix kernel-doc warnings and typos") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20net: add DEV_STATS_READ() helperEric Dumazet
[ Upstream commit 0b068c714ca9479d2783cc333fff5bc2d4a6d45c ] Companion of DEV_STATS_INC() & DEV_STATS_ADD(). This is going to be used in the series. Use it in macsec_get_stats64(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: ff672b9ffeb3 ("ipvlan: properly track tx_errors") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20tcp: fix cookie_init_timestamp() overflowsEric Dumazet
[ Upstream commit 73ed8e03388d16c12fc577e5c700b58a29045a15 ] cookie_init_timestamp() is supposed to return a 64bit timestamp suitable for both TSval determination and setting of skb->tstamp. Unfortunately it uses 32bit fields and overflows after 2^32 * 10^6 nsec (~49 days) of uptime. Generated TSval are still correct, but skb->tstamp might be set far away in the past, potentially confusing other layers. tcp_ns_to_ts() is changed to return a full 64bit value, ts and ts_now variables are changed to u64 type, and TSMASK is removed in favor of shifts operations. While we are at it, change this sequence: ts >>= TSBITS; ts--; ts <<= TSBITS; ts |= options; to: ts -= (1UL << TSBITS); Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20overflow: Implement size_t saturating arithmetic helpersKees Cook
[ Upstream commit e1be43d9b5d0d1310dbd90185a8e5c7145dde40f ] In order to perform more open-coded replacements of common allocation size arithmetic, the kernel needs saturating (SIZE_MAX) helpers for multiplication, addition, and subtraction. For example, it is common in allocators, especially on realloc, to add to an existing size: p = krealloc(map->patch, sizeof(struct reg_sequence) * (map->patch_regs + num_regs), GFP_KERNEL); There is no existing saturating replacement for this calculation, and just leaving the addition open coded inside array_size() could potentially overflow as well. For example, an overflow in an expression for a size_t argument might wrap to zero: array_size(anything, something_at_size_max + 1) == 0 Introduce size_mul(), size_add(), and size_sub() helpers that implicitly promote arguments to size_t and saturated calculations for use in allocations. With these helpers it is also possible to redefine array_size(), array3_size(), flex_array_size(), and struct_size() in terms of the new helpers. As with the check_*_overflow() helpers, the new helpers use __must_check, though what is really desired is a way to make sure that assignment is only to a size_t lvalue. Without this, it's still possible to introduce overflow/underflow via type conversion (i.e. from size_t to int). Enforcing this will currently need to be left to static analysis or future use of -Wconversion. Additionally update the overflow unit tests to force runtime evaluation for the pathological cases. Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Keith Busch <kbusch@kernel.org> Cc: Len Baker <len.baker@gmx.com> Signed-off-by: Kees Cook <keescook@chromium.org> Stable-dep-of: d692873cbe86 ("gve: Use size_add() in call to struct_size()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-08PCI: Prevent xHCI driver from claiming AMD VanGogh USB3 DRD deviceVicki Pfau
commit 7e6f3b6d2c352b5fde37ce3fed83bdf6172eebd4 upstream. The AMD VanGogh SoC contains a DesignWare USB3 Dual-Role Device that can be operated as either a USB Host or a USB Device, similar to on the AMD Nolan platform. be6646bfbaec ("PCI: Prevent xHCI driver from claiming AMD Nolan USB3 DRD device") added a quirk to let the dwc3 driver claim the Nolan device since it provides more specific support. Extend that quirk to include the VanGogh SoC USB3 device. Link: https://lore.kernel.org/r/20230927202212.2388216-1-vi@endrift.com Signed-off-by: Vicki Pfau <vi@endrift.com> [bhelgaas: include be6646bfbaec reference, add stable tag] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v3.19+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08can: isotp: add local echo tx processing and tx without FCOliver Hartkopp
commit 4b7fe92c06901f4563af0e36d25223a5ab343782 upstream commit 9f39d36530e5678d092d53c5c2c60d82b4dcc169 upstream commit 051737439eaee5bdd03d3c2ef5510d54a478fd05 upstream Due to the existing patch order applied to isotp.c in the stable kernel the original order of depending patches the three original patches 4b7fe92c0690 ("can: isotp: add local echo tx processing for consecutive frames") 9f39d36530e5 ("can: isotp: add support for transmission without flow control") 051737439eae ("can: isotp: fix race between isotp_sendsmg() and isotp_release()") can not be split into different patches that can be applied in working steps to the stable tree. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08rpmsg: Fix calling device_lock() on non-initialized deviceKrzysztof Kozlowski
commit bb17d110cbf270d5247a6e261c5ad50e362d1675 upstream. driver_set_override() helper uses device_lock() so it should not be called before rpmsg_register_device() (which calls device_register()). Effect can be seen with CONFIG_DEBUG_MUTEXES: DEBUG_LOCKS_WARN_ON(lock->magic != lock) WARNING: CPU: 3 PID: 57 at kernel/locking/mutex.c:582 __mutex_lock+0x1ec/0x430 ... Call trace: __mutex_lock+0x1ec/0x430 mutex_lock_nested+0x44/0x50 driver_set_override+0x124/0x150 qcom_glink_native_probe+0x30c/0x3b0 glink_rpm_probe+0x274/0x350 platform_probe+0x6c/0xe0 really_probe+0x17c/0x3d0 __driver_probe_device+0x114/0x190 driver_probe_device+0x3c/0xf0 ... Refactor the rpmsg_register_device() function to use two-step device registering (initialization + add) and call driver_set_override() in proper moment. This moves the code around, so while at it also NULL-ify the rpdev->driver_override in error path to be sure it won't be kfree() second time. Fixes: 42cd402b8fd4 ("rpmsg: Fix kfree() of static memory on setting driver_override") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20220429195946.1061725-2-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08rpmsg: Fix kfree() of static memory on setting driver_overrideKrzysztof Kozlowski
commit 42cd402b8fd4672b692400fe5f9eecd55d2794ac upstream. The driver_override field from platform driver should not be initialized from static memory (string literal) because the core later kfree() it, for example when driver_override is set via sysfs. Use dedicated helper to set driver_override properly. Fixes: 950a7388f02b ("rpmsg: Turn name service into a stand alone driver") Fixes: c0cdc19f84a4 ("rpmsg: Driver for user space endpoint interface") Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220419113435.246203-13-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08driver: platform: Add helper for safer setting of driver_overrideKrzysztof Kozlowski
commit 6c2f421174273de8f83cde4286d1c076d43a2d35 upstream. Several core drivers and buses expect that driver_override is a dynamically allocated memory thus later they can kfree() it. However such assumption is not documented, there were in the past and there are already users setting it to a string literal. This leads to kfree() of static memory during device release (e.g. in error paths or during unbind): kernel BUG at ../mm/slub.c:3960! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kfree) from [<c058da50>] (platform_device_release+0x88/0xb4) (platform_device_release) from [<c0585be0>] (device_release+0x2c/0x90) (device_release) from [<c0a69050>] (kobject_put+0xec/0x20c) (kobject_put) from [<c0f2f120>] (exynos5_clk_probe+0x154/0x18c) (exynos5_clk_probe) from [<c058de70>] (platform_drv_probe+0x6c/0xa4) (platform_drv_probe) from [<c058b7ac>] (really_probe+0x280/0x414) (really_probe) from [<c058baf4>] (driver_probe_device+0x78/0x1c4) (driver_probe_device) from [<c0589854>] (bus_for_each_drv+0x74/0xb8) (bus_for_each_drv) from [<c058b48c>] (__device_attach+0xd4/0x16c) (__device_attach) from [<c058a638>] (bus_probe_device+0x88/0x90) (bus_probe_device) from [<c05871fc>] (device_add+0x3dc/0x62c) (device_add) from [<c075ff10>] (of_platform_device_create_pdata+0x94/0xbc) (of_platform_device_create_pdata) from [<c07600ec>] (of_platform_bus_create+0x1a8/0x4fc) (of_platform_bus_create) from [<c0760150>] (of_platform_bus_create+0x20c/0x4fc) (of_platform_bus_create) from [<c07605f0>] (of_platform_populate+0x84/0x118) (of_platform_populate) from [<c0f3c964>] (of_platform_default_populate_init+0xa0/0xb8) (of_platform_default_populate_init) from [<c01031f8>] (do_one_initcall+0x8c/0x404) Provide a helper which clearly documents the usage of driver_override. This will allow later to reuse the helper and reduce the amount of duplicated code. Convert the platform driver to use a new helper and make the driver_override field const char (it is not modified by the core). Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220419113435.246203-2-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08kasan: print the original fault addr when access invalid shadowHaibo Li
commit babddbfb7d7d70ae7f10fedd75a45d8ad75fdddf upstream. when the checked address is illegal,the corresponding shadow address from kasan_mem_to_shadow may have no mapping in mmu table. Access such shadow address causes kernel oops. Here is a sample about oops on arm64(VA 39bit) with KASAN_SW_TAGS and KASAN_OUTLINE on: [ffffffb80aaaaaaa] pgd=000000005d3ce003, p4d=000000005d3ce003, pud=000000005d3ce003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Modules linked in: CPU: 3 PID: 100 Comm: sh Not tainted 6.6.0-rc1-dirty #43 Hardware name: linux,dummy-virt (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __hwasan_load8_noabort+0x5c/0x90 lr : do_ib_ob+0xf4/0x110 ffffffb80aaaaaaa is the shadow address for efffff80aaaaaaaa. The problem is reading invalid shadow in kasan_check_range. The generic kasan also has similar oops. It only reports the shadow address which causes oops but not the original address. Commit 2f004eea0fc8("x86/kasan: Print original address on #GP") introduce to kasan_non_canonical_hook but limit it to KASAN_INLINE. This patch extends it to KASAN_OUTLINE mode. Link: https://lkml.kernel.org/r/20231009073748.159228-1-haibo.li@mediatek.com Fixes: 2f004eea0fc8("x86/kasan: Print original address on #GP") Signed-off-by: Haibo Li <haibo.li@mediatek.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Haibo Li <haibo.li@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-11-08gtp: uapi: fix GTPA_MAXPablo Neira Ayuso
[ Upstream commit adc8df12d91a2b8350b0cd4c7fec3e8546c9d1f8 ] Subtract one to __GTPA_MAX, otherwise GTPA_MAX is off by 2. Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25Bluetooth: hci_sock: Correctly bounds check and pad HCI_MON_NEW_INDEX nameKees Cook
commit cb3871b1cd135a6662b732fbc6b3db4afcdb4a64 upstream. The code pattern of memcpy(dst, src, strlen(src)) is almost always wrong. In this case it is wrong because it leaves memory uninitialized if it is less than sizeof(ni->name), and overflows ni->name when longer. Normally strtomem_pad() could be used here, but since ni->name is a trailing array in struct hci_mon_new_index, compilers that don't support -fstrict-flex-arrays=3 can't tell how large this array is via __builtin_object_size(). Instead, open-code the helper and use sizeof() since it will work correctly. Additionally mark ni->name as __nonstring since it appears to not be a %NUL terminated C string. Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Cc: Edward AD <twuufnxlz@gmail.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Fixes: 18f547f3fc07 ("Bluetooth: hci_sock: fix slab oob read in create_monitor_event") Link: https://lore.kernel.org/lkml/202310110908.F2639D3276@keescook/ Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25perf: Disallow mis-matched inherited group readsPeter Zijlstra
commit 32671e3799ca2e4590773fd0e63aaa4229e50c06 upstream. Because group consistency is non-atomic between parent (filedesc) and children (inherited) events, it is possible for PERF_FORMAT_GROUP read() to try and sum non-matching counter groups -- with non-sensical results. Add group_generation to distinguish the case where a parent group removes and adds an event and thus has the same number, but a different configuration of events as inherited groups. This became a problem when commit fa8c269353d5 ("perf/core: Invert perf_read_group() loops") flipped the order of child_list and sibling_list. Previously it would iterate the group (sibling_list) first, and for each sibling traverse the child_list. In this order, only the group composition of the parent is relevant. By flipping the order the group composition of the child (inherited) events becomes an issue and the mis-match in group composition becomes evident. That said; even prior to this commit, while reading of a group that is not equally inherited was not broken, it still made no sense. (Ab)use ECHILD as error return to indicate issues with child process group composition. Fixes: fa8c269353d5 ("perf/core: Invert perf_read_group() loops") Reported-by: Budimir Markovic <markovicbudimir@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20231018115654.GK33217@noisy.programming.kicks-ass.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25ipv4/fib: send notify when delete source address routesHangbin Liu
[ Upstream commit 4b2b606075e50cdae62ab2356b0a1e206947c354 ] After deleting an interface address in fib_del_ifaddr(), the function scans the fib_info list for stray entries and calls fib_flush() and fib_table_flush(). Then the stray entries will be deleted silently and no RTM_DELROUTE notification will be sent. This lack of notification can make routing daemons, or monitor like `ip monitor route` miss the routing changes. e.g. + ip link add dummy1 type dummy + ip link add dummy2 type dummy + ip link set dummy1 up + ip link set dummy2 up + ip addr add 192.168.5.5/24 dev dummy1 + ip route add 7.7.7.0/24 dev dummy2 src 192.168.5.5 + ip -4 route 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 + ip monitor route + ip addr del 192.168.5.5/24 dev dummy1 Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5 Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5 As Ido reminded, fib_table_flush() isn't only called when an address is deleted, but also when an interface is deleted or put down. The lack of notification in these cases is deliberate. And commit 7c6bb7d2faaf ("net/ipv6: Add knob to skip DELROUTE message on device down") introduced a sysctl to make IPv6 behave like IPv4 in this regard. So we can't send the route delete notify blindly in fib_table_flush(). To fix this issue, let's add a new flag in "struct fib_info" to track the deleted prefer source address routes, and only send notify for them. After update: + ip monitor route + ip addr del 192.168.5.5/24 dev dummy1 Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5 Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5 Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5 Deleted 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5 Suggested-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230922075508.848925-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25Bluetooth: hci_core: Fix build warningsLuiz Augusto von Dentz
[ Upstream commit dcda165706b9fbfd685898d46a6749d7d397e0c0 ] This fixes the following warnings: net/bluetooth/hci_core.c: In function ‘hci_register_dev’: net/bluetooth/hci_core.c:2620:54: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size 5 [-Wformat-truncation=] 2620 | snprintf(hdev->name, sizeof(hdev->name), "hci%d", id); | ^~ net/bluetooth/hci_core.c:2620:50: note: directive argument in the range [0, 2147483647] 2620 | snprintf(hdev->name, sizeof(hdev->name), "hci%d", id); | ^~~~~~~ net/bluetooth/hci_core.c:2620:9: note: ‘snprintf’ output between 5 and 14 bytes into a destination of size 8 2620 | snprintf(hdev->name, sizeof(hdev->name), "hci%d", id); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25drm/connector: Add support for out-of-band hotplug notification (v3)Hans de Goede
[ Upstream commit 72ad49682dde3d9de5708b8699dc8e0b44962322 ] Add a new drm_connector_oob_hotplug_event() function and oob_hotplug_event drm_connector_funcs member. On some hardware a hotplug event notification may come from outside the display driver / device. An example of this is some USB Type-C setups where the hardware muxes the DisplayPort data and aux-lines but does not pass the altmode HPD status bit to the GPU's DP HPD pin. In cases like this the new drm_connector_oob_hotplug_event() function can be used to report these out-of-band events. Changes in v2: - Make drm_connector_oob_hotplug_event() take a fwnode as argument and have it call drm_connector_find_by_fwnode() internally. This allows making drm_connector_find_by_fwnode() a drm-internal function and avoids code outside the drm subsystem potentially holding on the a drm_connector reference for a longer period. Changes in v3: - Drop the data argument to the drm_connector_oob_hotplug_event function since it is not used atm. This can be re-added later when a use for it actually arises. Tested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Link: https://lore.kernel.org/r/20210817215201.795062-5-hdegoede@redhat.com Stable-dep-of: 89434b069e46 ("usb: typec: altmodes/displayport: Signal hpd low when exiting mode") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25drm/connector: Add drm_connector_find_by_fwnode() function (v3)Hans de Goede
[ Upstream commit 3d3f7c1e68691574c1d87cd0f9f2348323bc0199 ] Add a function to find a connector based on a fwnode. This will be used by the new drm_connector_oob_hotplug_event() function which is added by the next patch in this patch-set. Changes in v2: - Complete rewrite to use a global connector list in drm_connector.c rather then using a class-dev-iter in drm_sysfs.c Changes in v3: - Add forward declaration for struct fwnode_handle to drm_crtc_internal.h (fixes warning reported by kernel test robot <lkp@intel.com>) Tested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Link: https://lore.kernel.org/r/20210817215201.795062-4-hdegoede@redhat.com Stable-dep-of: 89434b069e46 ("usb: typec: altmodes/displayport: Signal hpd low when exiting mode") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25drm/connector: Add a fwnode pointer to drm_connector and register with ACPI (v2)Hans de Goede
[ Upstream commit 48c429c6d18db115c277b75000152d8fa4cd35d0 ] Add a fwnode pointer to struct drm_connector and register an acpi_bus_type for the connectors with the ACPI subsystem (when CONFIG_ACPI is enabled). The adding of the fwnode pointer allows drivers to associate a fwnode that represents a connector with that connector. When the new fwnode pointer points to an ACPI-companion, then the new acpi_bus_type will cause the ACPI subsys to bind the device instantiated for the connector with the fwnode by calling acpi_bind_one(). This will result in a firmware_node symlink under /sys/class/card#-<connecter-name>/ which helps to verify that the fwnode-s and connectors are properly matched. Changes in v2: - Make drm_connector_cleanup() call fwnode_handle_put() on connector->fwnode and document this Co-developed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Tested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Link: https://lore.kernel.org/r/20210817215201.795062-3-hdegoede@redhat.com Stable-dep-of: 89434b069e46 ("usb: typec: altmodes/displayport: Signal hpd low when exiting mode") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25usb: core: Track SuperSpeed Plus GenXxYThinh Nguyen
[ Upstream commit 0299809be415567366b66f248eed93848b8dc9f3 ] Introduce ssp_rate field to usb_device structure to capture the connected SuperSpeed Plus signaling rate generation and lane count with the corresponding usb_ssp_rate enum. Signed-off-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/b7805d121e5ae4ad5ae144bd860b6ac04ee47436.1615432770.git.Thinh.Nguyen@synopsys.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Stable-dep-of: f74a7afc224a ("usb: hub: Guard against accesses to uninitialized BOS descriptors") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25resource: Add irqresource_disabled()John Garry
[ Upstream commit 9806731db684a475ade1e95d166089b9edbd9da3 ] Add a common function to set the fields for a irq resource to disabled, which mimics what is done in acpi_dev_irqresource_disabled(), with a view to replace that function. Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/1606905417-183214-3-git-send-email-john.garry@huawei.com Stable-dep-of: c1ed72171ed5 ("ACPI: resource: Skip IRQ override on ASUS ExpertBook B1402CBA") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25neighbor: tracing: Move pin6 inside CONFIG_IPV6=y sectionGeert Uytterhoeven
commit 2915240eddba96b37de4c7e9a3d0ac6f9548454b upstream. When CONFIG_IPV6=n, and building with W=1: In file included from include/trace/define_trace.h:102, from include/trace/events/neigh.h:255, from net/core/net-traces.c:51: include/trace/events/neigh.h: In function ‘trace_event_raw_event_neigh_create’: include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not used [-Werror=unused-but-set-variable] 42 | struct in6_addr *pin6; | ^~~~ include/trace/trace_events.h:402:11: note: in definition of macro ‘DECLARE_EVENT_CLASS’ 402 | { assign; } \ | ^~~~~~ include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’ 44 | PARAMS(assign), \ | ^~~~~~ include/trace/events/neigh.h:23:1: note: in expansion of macro ‘TRACE_EVENT’ 23 | TRACE_EVENT(neigh_create, | ^~~~~~~~~~~ include/trace/events/neigh.h:41:9: note: in expansion of macro ‘TP_fast_assign’ 41 | TP_fast_assign( | ^~~~~~~~~~~~~~ In file included from include/trace/define_trace.h:103, from include/trace/events/neigh.h:255, from net/core/net-traces.c:51: include/trace/events/neigh.h: In function ‘perf_trace_neigh_create’: include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not used [-Werror=unused-but-set-variable] 42 | struct in6_addr *pin6; | ^~~~ include/trace/perf.h:51:11: note: in definition of macro ‘DECLARE_EVENT_CLASS’ 51 | { assign; } \ | ^~~~~~ include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’ 44 | PARAMS(assign), \ | ^~~~~~ include/trace/events/neigh.h:23:1: note: in expansion of macro ‘TRACE_EVENT’ 23 | TRACE_EVENT(neigh_create, | ^~~~~~~~~~~ include/trace/events/neigh.h:41:9: note: in expansion of macro ‘TP_fast_assign’ 41 | TP_fast_assign( | ^~~~~~~~~~~~~~ Indeed, the variable pin6 is declared and initialized unconditionally, while it is only used and needlessly re-initialized when support for IPv6 is enabled. Fix this by dropping the unused variable initialization, and moving the variable declaration inside the existing section protected by a check for CONFIG_IPV6. Fixes: fc651001d2c5ca4f ("neighbor: Add tracepoint to __neigh_create") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25tcp: fix excessive TLP and RACK timeouts from HZ roundingNeal Cardwell
commit 1c2709cfff1dedbb9591e989e2f001484208d914 upstream. We discovered from packet traces of slow loss recovery on kernels with the default HZ=250 setting (and min_rtt < 1ms) that after reordering, when receiving a SACKed sequence range, the RACK reordering timer was firing after about 16ms rather than the desired value of roughly min_rtt/4 + 2ms. The problem is largely due to the RACK reorder timer calculation adding in TCP_TIMEOUT_MIN, which is 2 jiffies. On kernels with HZ=250, this is 2*4ms = 8ms. The TLP timer calculation has the exact same issue. This commit fixes the TLP transmit timer and RACK reordering timer floor calculation to more closely match the intended 2ms floor even on kernels with HZ=250. It does this by adding in a new TCP_TIMEOUT_MIN_US floor of 2000 us and then converting to jiffies, instead of the current approach of converting to jiffies and then adding th TCP_TIMEOUT_MIN value of 2 jiffies. Our testing has verified that on kernels with HZ=1000, as expected, this does not produce significant changes in behavior, but on kernels with the default HZ=250 the latency improvement can be large. For example, our tests show that for HZ=250 kernels at low RTTs this fix roughly halves the latency for the RACK reorder timer: instead of mostly firing at 16ms it mostly fires at 8ms. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Fixes: bb4d991a28cc ("tcp: adjust tail loss probe timeout") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231015174700.2206872-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25xfrm: fix a data-race in xfrm_gen_index()Eric Dumazet
commit 3e4bc23926b83c3c67e5f61ae8571602754131a6 upstream. xfrm_gen_index() mutual exclusion uses net->xfrm.xfrm_policy_lock. This means we must use a per-netns idx_generator variable, instead of a static one. Alternative would be to use an atomic variable. syzbot reported: BUG: KCSAN: data-race in xfrm_sk_policy_insert / xfrm_sk_policy_insert write to 0xffffffff87005938 of 4 bytes by task 29466 on cpu 0: xfrm_gen_index net/xfrm/xfrm_policy.c:1385 [inline] xfrm_sk_policy_insert+0x262/0x640 net/xfrm/xfrm_policy.c:2347 xfrm_user_policy+0x413/0x540 net/xfrm/xfrm_state.c:2639 do_ipv6_setsockopt+0x1317/0x2ce0 net/ipv6/ipv6_sockglue.c:943 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 rawv6_setsockopt+0x21e/0x410 net/ipv6/raw.c:1054 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffffffff87005938 of 4 bytes by task 29460 on cpu 1: xfrm_sk_policy_insert+0x13e/0x640 xfrm_user_policy+0x413/0x540 net/xfrm/xfrm_state.c:2639 do_ipv6_setsockopt+0x1317/0x2ce0 net/ipv6/ipv6_sockglue.c:943 ipv6_setsockopt+0x57/0x130 net/ipv6/ipv6_sockglue.c:1012 rawv6_setsockopt+0x21e/0x410 net/ipv6/raw.c:1054 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x00006ad8 -> 0x00006b18 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 29460 Comm: syz-executor.1 Not tainted 6.5.0-rc5-syzkaller-00243-g9106536c1aa3 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Fixes: 1121994c803f ("netns xfrm: policy insertion in netns") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25dev_forward_skb: do not scrub skb mark within the same name spaceNicolas Dichtel
commit ff70202b2d1ad522275c6aadc8c53519b6a22c57 upstream. The goal is to keep the mark during a bpf_redirect(), like it is done for legacy encapsulation / decapsulation, when there is no x-netns. This was initially done in commit 213dd74aee76 ("skbuff: Do not scrub skb mark within the same name space"). When the call to skb_scrub_packet() was added in dev_forward_skb() (commit 8b27f27797ca ("skb: allow skb_scrub_packet() to be used by tunnels")), the second argument (xnet) was set to true to force a call to skb_orphan(). At this time, the mark was always cleanned up by skb_scrub_packet(), whatever xnet value was. This call to skb_orphan() was removed later in commit 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb."). But this 'true' stayed here without any real reason. Let's correctly set xnet in ____dev_forward_skb(), this function has access to the previous interface and to the new interface. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25mcb: remove is_added flag from mcb_device structJorge Sanjuan Garcia
commit 0f28ada1fbf0054557cddcdb93ad17f767105208 upstream. When calling mcb_bus_add_devices(), both mcb devices and the mcb bus will attempt to attach a device to a driver because they share the same bus_type. This causes an issue when trying to cast the container of the device to mcb_device struct using to_mcb_device(), leading to a wrong cast when the mcb_bus is added. A crash occurs when freing the ida resources as the bus numbering of mcb_bus gets confused with the is_added flag on the mcb_device struct. The only reason for this cast was to keep an is_added flag on the mcb_device struct that does not seem necessary. The function device_attach() handles already bound devices and the mcb subsystem does nothing special with this is_added flag so remove it completely. Fixes: 18d288198099 ("mcb: Correctly initialize the bus's device") Cc: stable <stable@kernel.org> Signed-off-by: Jorge Sanjuan Garcia <jorge.sanjuangarcia@duagon.com> Co-developed-by: Jose Javier Rodriguez Barbarin <JoseJavier.Rodriguez@duagon.com> Signed-off-by: Jose Javier Rodriguez Barbarin <JoseJavier.Rodriguez@duagon.com> Link: https://lore.kernel.org/r/20230906114901.63174-2-JoseJavier.Rodriguez@duagon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25net: change accept_ra_min_rtr_lft to affect all RA lifetimesPatrick Rohr
commit 5027d54a9c30bc7ec808360378e2b4753f053f25 upstream. accept_ra_min_rtr_lft only considered the lifetime of the default route and discarded entire RAs accordingly. This change renames accept_ra_min_rtr_lft to accept_ra_min_lft, and applies the value to individual RA sections; in particular, router lifetime, PIO preferred lifetime, and RIO lifetime. If any of those lifetimes are lower than the configured value, the specific RA section is ignored. In order for the sysctl to be useful to Android, it should really apply to all lifetimes in the RA, since that is what determines the minimum frequency at which RAs must be processed by the kernel. Android uses hardware offloads to drop RAs for a fraction of the minimum of all lifetimes present in the RA (some networks have very frequent RAs (5s) with high lifetimes (2h)). Despite this, we have encountered networks that set the router lifetime to 30s which results in very frequent CPU wakeups. Instead of disabling IPv6 (and dropping IPv6 ethertype in the WiFi firmware) entirely on such networks, it seems better to ignore the misconfigured routers while still processing RAs from other IPv6 routers on the same network (i.e. to support IoT applications). The previous implementation dropped the entire RA based on router lifetime. This turned out to be hard to expand to the other lifetimes present in the RA in a consistent manner; dropping the entire RA based on RIO/PIO lifetimes would essentially require parsing the whole thing twice. Fixes: 1671bcfd76fd ("net: add sysctl accept_ra_min_rtr_lft") Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Patrick Rohr <prohr@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230726230701.919212-1-prohr@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25net: add sysctl accept_ra_min_rtr_lftPatrick Rohr
commit 1671bcfd76fdc0b9e65153cf759153083755fe4c upstream. This change adds a new sysctl accept_ra_min_rtr_lft to specify the minimum acceptable router lifetime in an RA. If the received RA router lifetime is less than the configured value (and not 0), the RA is ignored. This is useful for mobile devices, whose battery life can be impacted by networks that configure RAs with a short lifetime. On such networks, the device should never gain IPv6 provisioning and should attempt to drop RAs via hardware offload, if available. Signed-off-by: Patrick Rohr <prohr@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-25net: macsec: indicate next pn update when offloadingRadu Pirea (NXP OSS)
[ Upstream commit 0412cc846a1ef38697c3f321f9b174da91ecd3b5 ] Indicate next PN update using update_pn flag in macsec_context. Offloaded MACsec implementations does not know whether or not the MACSEC_SA_ATTR_PN attribute was passed for an SA update and assume that next PN should always updated, but this is not always true. The PN can be reset to its initial value using the following command: $ ip macsec set macsec0 tx sa 0 off #octeontx2-pf case Or, the update PN command will succeed even if the driver does not support PN updates. $ ip macsec set macsec0 tx sa 0 pn 1 on #mscc phy driver case Comparing the initial PN with the new PN value is not a solution. When the user updates the PN using its initial value the command will succeed, even if the driver does not support it. Like this: $ ip macsec add macsec0 tx sa 0 pn 1 on key 00 \ ead3664f508eb06c40ac7104cdae4ce5 $ ip macsec set macsec0 tx sa 0 pn 1 on #mlx5 case Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: e0a8c918daa5 ("net: phy: mscc: macsec: reject PN update requests") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-25quota: Fix slow quotaoffJan Kara
commit 869b6ea1609f655a43251bf41757aa44e5350a8f upstream. Eric has reported that commit dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide") heavily increases runtime of generic/270 xfstest for ext4 in nojournal mode. The reason for this is that ext4 in nojournal mode leaves dquots dirty until the last dqput() and thus the cleanup done in quota_release_workfn() has to write them all. Due to the way quota_release_workfn() is written this results in synchronize_srcu() call for each dirty dquot which makes the dquot cleanup when turning quotas off extremely slow. To be able to avoid synchronize_srcu() for each dirty dquot we need to rework how we track dquots to be cleaned up. Instead of keeping the last dquot reference while it is on releasing_dquots list, we drop it right away and mark the dquot with new DQ_RELEASING_B bit instead. This way we can we can remove dquot from releasing_dquots list when new reference to it is acquired and thus there's no need to call synchronize_srcu() each time we drop dq_list_lock. References: https://lore.kernel.org/all/ZRytn6CxFK2oECUt@debian-BULLSEYE-live-builder-AMD64 Reported-by: Eric Whitney <enwlinux@gmail.com> Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10netfilter: nf_tables: fix kdoc warnings after gc reworkFlorian Westphal
commit 08713cb006b6f07434f276c5ee214fb20c7fd965 upstream. Jakub Kicinski says: We've got some new kdoc warnings here: net/netfilter/nft_set_pipapo.c:1557: warning: Function parameter or member '_set' not described in 'pipapo_gc' net/netfilter/nft_set_pipapo.c:1557: warning: Excess function parameter 'set' description in 'pipapo_gc' include/net/netfilter/nf_tables.h:577: warning: Function parameter or member 'dead' not described in 'nft_set' Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20230810104638.746e46f1@kernel.org/ Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10tcp: fix quick-ack counting to count actual ACKs of new dataNeal Cardwell
[ Upstream commit 059217c18be6757b95bfd77ba53fb50b48b8a816 ] This commit fixes quick-ack counting so that it only considers that a quick-ack has been provided if we are sending an ACK that newly acknowledges data. The code was erroneously using the number of data segments in outgoing skbs when deciding how many quick-ack credits to remove. This logic does not make sense, and could cause poor performance in request-response workloads, like RPC traffic, where requests or responses can be multi-segment skbs. When a TCP connection decides to send N quick-acks, that is to accelerate the cwnd growth of the congestion control module controlling the remote endpoint of the TCP connection. That quick-ack decision is purely about the incoming data and outgoing ACKs. It has nothing to do with the outgoing data or the size of outgoing data. And in particular, an ACK only serves the intended purpose of allowing the remote congestion control to grow the congestion window quickly if the ACK is ACKing or SACKing new data. The fix is simple: only count packets as serving the goal of the quickack mechanism if they are ACKing/SACKing new data. We can tell whether this is the case by checking inet_csk_ack_scheduled(), since we schedule an ACK exactly when we are ACKing/SACKing new data. Fixes: fc6415bcb0f5 ("[TCP]: Fix quick-ack decrementing with TSO.") Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231001151239.1866845-1-ncardwell.sw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10netfilter: handle the connecting collision properly in nf_conntrack_proto_sctpXin Long
[ Upstream commit 8e56b063c86569e51eed1c5681ce6361fa97fc7a ] In Scenario A and B below, as the delayed INIT_ACK always changes the peer vtag, SCTP ct with the incorrect vtag may cause packet loss. Scenario A: INIT_ACK is delayed until the peer receives its own INIT_ACK 192.168.1.2 > 192.168.1.1: [INIT] [init tag: 1328086772] 192.168.1.1 > 192.168.1.2: [INIT] [init tag: 1414468151] 192.168.1.2 > 192.168.1.1: [INIT ACK] [init tag: 1328086772] 192.168.1.1 > 192.168.1.2: [INIT ACK] [init tag: 1650211246] * 192.168.1.2 > 192.168.1.1: [COOKIE ECHO] 192.168.1.1 > 192.168.1.2: [COOKIE ECHO] 192.168.1.2 > 192.168.1.1: [COOKIE ACK] Scenario B: INIT_ACK is delayed until the peer completes its own handshake 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885] 192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO] 192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021] * This patch fixes it as below: In SCTP_CID_INIT processing: - clear ct->proto.sctp.init[!dir] if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir]. (Scenario E) - set ct->proto.sctp.init[dir]. In SCTP_CID_INIT_ACK processing: - drop it if !ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] && ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario B, Scenario C) - drop it if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario A) In SCTP_CID_COOKIE_ACK processing: - clear ct->proto.sctp.init[dir] and ct->proto.sctp.init[!dir]. (Scenario D) Also, it's important to allow the ct state to move forward with cookie_echo and cookie_ack from the opposite dir for the collision scenarios. There are also other Scenarios where it should allow the packet through, addressed by the processing above: Scenario C: new CT is created by INIT_ACK. Scenario D: start INIT on the existing ESTABLISHED ct. Scenario E: start INIT after the old collision on the existing ESTABLISHED ct. 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408] 192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885] (both side are stopped, then start new connection again in hours) 192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 242308742] Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10bpf: Fix tr dereferencingLeon Hwang
[ Upstream commit b724a6418f1f853bcb39c8923bf14a50c7bdbd07 ] Fix 'tr' dereferencing bug when CONFIG_BPF_JIT is turned off. When CONFIG_BPF_JIT is turned off, 'bpf_trampoline_get()' returns NULL, which is same as the cases when CONFIG_BPF_JIT is turned on. Closes: https://lore.kernel.org/r/202309131936.5Nc8eUD0-lkp@intel.com/ Fixes: f7b12b6fea00 ("bpf: verifier: refactor check_attach_btf_id()") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230917153846.88732-1-hffilwlqm@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10ata: libata-sata: increase PMP SRST timeout to 10sMatthias Schiffer
commit 753a4d531bc518633ea88ac0ed02b25a16823d51 upstream. On certain SATA controllers, softreset fails after wakeup from S2RAM with the message "softreset failed (1st FIS failed)", sometimes resulting in drives not being detected again. With the increased timeout, this issue is avoided. Instead, "softreset failed (device not ready)" is now logged 1-2 times; this later failure seems to cause fewer problems however, and the drives are detected reliably once they've spun up and the probe is retried. The issue was observed with the primary SATA controller of the QNAP TS-453B, which is an "Intel Corporation Celeron/Pentium Silver Processor SATA Controller [8086:31e3] (rev 06)" integrated in the Celeron J4125 CPU, and the following drives: - Seagate IronWolf ST12000VN0008 - Seagate IronWolf ST8000NE0004 The SATA controller seems to be more relevant to this issue than the drives, as the same drives are always detected reliably on the secondary SATA controller on the same board (an ASMedia 106x) without any "softreset failed" errors even without the increased timeout. Fixes: e7d3ef13d52a ("libata: change drive ready wait after hard reset to 5s") Cc: stable@vger.kernel.org Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10bpf: Fix BTF_ID symbol generation collisionJiri Olsa
commit 8f908db77782630c45ba29dac35c434b5ce0b730 upstream. Marcus and Satya reported an issue where BTF_ID macro generates same symbol in separate objects and that breaks final vmlinux link. ld.lld: error: ld-temp.o <inline asm>:14577:1: symbol '__BTF_ID__struct__cgroup__624' is already defined This can be triggered under specific configs when __COUNTER__ happens to be the same for the same symbol in two different translation units, which is already quite unlikely to happen. Add __LINE__ number suffix to make BTF_ID symbol more unique, which is not a complete fix, but it would help for now and meanwhile we can work on better solution as suggested by Andrii. Cc: stable@vger.kernel.org Reported-by: Satya Durga Srinivasu Prabhala <quic_satyap@quicinc.com> Reported-by: Marcus Seyfarth <m.seyfarth@gmail.com> Closes: https://github.com/ClangBuiltLinux/linux/issues/1913 Debugged-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/bpf/CAEf4Bzb5KQ2_LmhN769ifMeSJaWfebccUasQOfQKaOd0nQ51tw@mail.gmail.com/ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230915-bpf_collision-v3-1-263fc519c21f@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10ACPI: Check StorageD3Enable _DSD property in ACPI codeMario Limonciello
[ Upstream commit 2744d7a0733503931b71c00d156119ced002f22c ] Although first implemented for NVME, this check may be usable by other drivers as well. Microsoft's specification explicitly mentions that is may be usable by SATA and AHCI devices. Google also indicates that they have used this with SDHCI in a downstream kernel tree that a user can plug a storage device into. Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Suggested-by: Keith Busch <kbusch@kernel.org> CC: Shyam-sundar S-k <Shyam-sundar.S-k@amd.com> CC: Alexander Deucher <Alexander.Deucher@amd.com> CC: Rafael J. Wysocki <rjw@rjwysocki.net> CC: Prike Liang <prike.liang@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Stable-dep-of: dad651b2a44e ("nvme-pci: do not set the NUMA node of device if it has none") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10cgroup: Fix suspicious rcu_dereference_check() usage warningChengming Zhou
commit f2aa197e4794bf4c2c0c9570684f86e6fa103e8b upstream. task_css_set_check() will use rcu_dereference_check() to check for rcu_read_lock_held() on the read-side, which is not true after commit dc6e0818bc9a ("sched/cpuacct: Optimize away RCU read lock"). This commit drop explicit rcu_read_lock(), change to RCU-sched read-side critical section. So fix the RCU warning by adding check for rcu_read_lock_sched_held(). Fixes: dc6e0818bc9a ("sched/cpuacct: Optimize away RCU read lock") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: syzbot+16e3f2c77e7c5a0113f9@syzkaller.appspotmail.com Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20220305034103.57123-1-zhouchengming@bytedance.com Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10sched/cpuacct: Optimize away RCU read lockChengming Zhou
commit dc6e0818bc9a0336d9accf3ea35d146d72aa7a18 upstream. Since cpuacct_charge() is called from the scheduler update_curr(), we must already have rq lock held, then the RCU read lock can be optimized away. And do the same thing in it's wrapper cgroup_account_cputime(), but we can't use lockdep_assert_rq_held() there, which defined in kernel/sched/sched.h. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220220051426.5274-2-zhouchengming@bytedance.com [OP: adjusted lockdep_assert_rq_held() -> lockdep_assert_held()] Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10bpf: Clarify error expectations from bpf_clone_redirectStanislav Fomichev
[ Upstream commit 7cb779a6867fea00b4209bcf6de2f178a743247d ] Commit 151e887d8ff9 ("veth: Fixing transmit return status for dropped packets") exposed the fact that bpf_clone_redirect is capable of returning raw NET_XMIT_XXX return codes. This is in the conflict with its UAPI doc which says the following: "0 on success, or a negative error in case of failure." Update the UAPI to reflect the fact that bpf_clone_redirect can return positive error numbers, but don't explicitly define their meaning. Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230911194731.286342-1-sdf@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10ata: libata: Rename link flag ATA_LFLAG_NO_DB_DELAYPaul Menzel
[ Upstream commit b9ba367c513dbc165dd6c01266a59db4be2a3564 ] Rename the link flag ATA_LFLAG_NO_DB_DELAY to ATA_LFLAG_NO_DEBOUNCE_DELAY. The new name is longer, but clearer. Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Stable-dep-of: 2a2df98ec592 ("ata: ahci: Add Elkhart Lake AHCI controller") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10netfilter: nf_tables: add and use nft_thoff helperFlorian Westphal
[ Upstream commit 2d7b4ace0754ebaaf71c6824880178d46aa0ab33 ] This allows to change storage placement later on without changing readers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Stable-dep-of: 28427f368f0e ("netfilter: nft_exthdr: Fix non-linear header modification") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10netfilter: nf_tables: add and use nft_sk helperFlorian Westphal
[ Upstream commit 85554eb981e5a8b0b8947611193aef1737081ef2 ] This allows to change storage placement later on without changing readers. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Stable-dep-of: 28427f368f0e ("netfilter: nft_exthdr: Fix non-linear header modification") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10netfilter: nft_exthdr: Support SCTP chunksPhil Sutter
[ Upstream commit 133dc203d77dff617d9c4673973ef3859be2c476 ] Chunks are SCTP header extensions similar in implementation to IPv6 extension headers or TCP options. Reusing exthdr expression to find and extract field values from them is therefore pretty straightforward. For now, this supports extracting data from chunks at a fixed offset (and length) only - chunks themselves are an extensible data structure; in order to make all fields available, a nested extension search is needed. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Stable-dep-of: 28427f368f0e ("netfilter: nft_exthdr: Fix non-linear header modification") Signed-off-by: Sasha Levin <sashal@kernel.org>