summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
4 daysMerge branch 'selftests-forwarding-fix-br_netfilter-related-test-failures'ipvs/mainipvs/HEADPaolo Abeni
Aleksei Oladko says: ==================== selftests: forwarding: fix br_netfilter related test failures This patch series fixes kselftests that fail when the br_nefilter module is loaded. The failures occur because the tests generate packets that are either modified or encapsulated, but their IP headers are not fully correct for sanity checks performed by be_netfilter. Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com> ==================== Link: https://patch.msgid.link/20260213131907.43351-1-aleksey.oladko@virtuozzo.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysselftests: forwarding: fix pedit tests failure with br_netfilter enabledAleksei Oladko
The tests use the tc pedit action to modify the IPv4 source address ("pedit ex munge ip src set"), but the IP header checksum is not recalculated after the modification. As a result, the modified packet fails sanity checks in br_netfilter after bridging and is dropped, which causes the test to fail. Fix this by ensuring net.bridge.bridge-nf-call-iptables is set to 0 during the test execution. This prevents the bridge from passing L2 traffic to netfilter, bypassing the checksum validation that causes the test failure. Fixes: 92ad3828944e ("selftests: forwarding: Add a test for pedit munge SIP and DIP") Fixes: 226657ba2389 ("selftests: forwarding: Add a forwarding test for pedit munge dsfield") Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260213131907.43351-4-aleksey.oladko@virtuozzo.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysselftests: forwarding: vxlan_bridge_1d_ipv6: fix test failure with ↵Aleksei Oladko
br_netfilter enabled The test generates VXLAN traffic using mausezahn, where the encapsulated inner IPv6 packet has an incorrect payload length set in the IPv6 header. After VXLAN decapsulation, such packets do not pass sanity checks in br_netfilter and are dropped, which causes the test to fail. Fix this by setting the correct IPv6 payload length for the encapsulated packet generated by mausezahn, so that the packet is accepted by br_netfilter. tools/testing/selftests/net/forwarding/vxlan_bridge_1d_ipv6.sh lines 698-706 )"00:03:"$( : Payload length )"3a:"$( : Next header )"04:"$( : Hop limit )"$saddr:"$( : IP saddr )"$daddr:"$( : IP daddr )"80:"$( : ICMPv6.type )"00:"$( : ICMPv6.code )"00:"$( : ICMPv6.checksum ) Data after IPv6 header: • 80: — 1 byte (ICMPv6 type) • 00: — 1 byte (ICMPv6 code) • 00: — 1 byte (ICMPv6 checksum, truncated) Total: 3 bytes → 00:03 is correct. The old value 00:08 did not match the actual payload size. Fixes: b07e9957f220 ("selftests: forwarding: Add VxLAN tests with a VLAN-unaware bridge for IPv6") Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260213131907.43351-3-aleksey.oladko@virtuozzo.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysselftests: forwarding: vxlan_bridge_1d: fix test failure with br_netfilter ↵Aleksei Oladko
enabled The test generates VXLAN traffic using mausezahn, where the encapsulated inner IPv4 packet contains a zero IP header checksum. After VXLAN decapsulation, such packets do not pass sanity checks in br_netfilter and are dropped, which causes the test to fail. Fix this by calculating and setting a valid IPv4 header checksum for the encapsulated packet generated by mausezahn, so that the packet is accepted by br_netfilter. Fixed by using the payload_template_calc_checksum() / payload_template_expand_checksum() helpers that are only available in v6.3 and newer kernels. Fixes: a0b61f3d8ebf ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test") Signed-off-by: Aleksei Oladko <aleksey.oladko@virtuozzo.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260213131907.43351-2-aleksey.oladko@virtuozzo.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysnet: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RTEric Dumazet
CONFIG_PREEMPT_RT is special, make this clear in backlog_lock_irq_save() and backlog_unlock_irq_restore(). The issue shows up with CONFIG_DEBUG_IRQFLAGS=y raw_local_irq_restore() called with IRQs enabled WARNING: kernel/locking/irqflag-debug.c:10 at warn_bogus_irq_restore+0xc/0x20 kernel/locking/irqflag-debug.c:10, CPU#1: aoe_tx0/1321 Modules linked in: CPU: 1 UID: 0 PID: 1321 Comm: aoe_tx0 Not tainted syzkaller #0 PREEMPT_{RT,(full)} Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/24/2026 RIP: 0010:warn_bogus_irq_restore+0xc/0x20 kernel/locking/irqflag-debug.c:10 Call Trace: <TASK> backlog_unlock_irq_restore net/core/dev.c:253 [inline] enqueue_to_backlog+0x525/0xcf0 net/core/dev.c:5347 netif_rx_internal+0x120/0x550 net/core/dev.c:5659 __netif_rx+0xa9/0x110 net/core/dev.c:5679 loopback_xmit+0x43a/0x660 drivers/net/loopback.c:90 __netdev_start_xmit include/linux/netdevice.h:5275 [inline] netdev_start_xmit include/linux/netdevice.h:5284 [inline] xmit_one net/core/dev.c:3864 [inline] dev_hard_start_xmit+0x2df/0x830 net/core/dev.c:3880 __dev_queue_xmit+0x16f4/0x3990 net/core/dev.c:4829 dev_queue_xmit include/linux/netdevice.h:3384 [inline] Fixes: 27a01c1969a5 ("net: fully inline backlog_unlock_irq_restore()") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260213120427.2914544-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysprintk: add CONFIG_PRINTK dependency for netconsoleArnd Bergmann
The 'select PRINTK_EXECUTION_CTX' line now causes a harmless warning when NETCONSOLE_DYNAMIC is enabled but PRINTK is not: WARNING: unmet direct dependencies detected for PRINTK_EXECUTION_CTX Depends on [n]: PRINTK [=n] Selected by [y]: - NETCONSOLE_DYNAMIC [=y] && NETDEVICES [=y] && NET_CORE [=y] && NETCONSOLE [=y] && SYSFS [=y] && CONFIGFS_FS [=y] && (NETCONSOLE [=y]!=y [=y] || CONFIGFS_FS [=y]!=m [=m]) In that configuration, the netconsole driver is useless anyway, so avoid this with an added dependency that prevents CONFIG_NETCONSOLE to be enabled without CONFIG_PRINTK. Fixes: 60325c27d3cf ("printk: Add execution context (task name/CPU) to printk_info") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260213074431.1729627-1-arnd@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysMerge branch 'net-bridge-mcast-fix-mdb_n_entries-counting-warning'Paolo Abeni
Nikolay Aleksandrov says: ==================== net: bridge: mcast: fix mdb_n_entries counting warning The first patch fixes a warning in the mdb_n_entries code which was reported by syzbot, the second patch adds tests for different ways which used to trigger said warning. For more information please check the individual commit messages. ==================== Link: https://patch.msgid.link/20260213070031.1400003-1-nikolay@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysselftests: forwarding: bridge_mdb_max: add tests for mdb_n_entries warningNikolay Aleksandrov
Recently we were able to trigger a warning in the mdb_n_entries counting code. Add tests that exercise different ways which used to trigger that warning. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://patch.msgid.link/20260213070031.1400003-3-nikolay@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysnet: bridge: mcast: always update mdb_n_entries for vlan contextsNikolay Aleksandrov
syzbot triggered a warning[1] about the number of mdb entries in a context. It turned out that there are multiple ways to trigger that warning today (some got added during the years), the root cause of the problem is that the increase is done conditionally, and over the years these different conditions increased so there were new ways to trigger the warning, that is to do a decrease which wasn't paired with a previous increase. For example one way to trigger it is with flush: $ ip l add br0 up type bridge vlan_filtering 1 mcast_snooping 1 $ ip l add dumdum up master br0 type dummy $ bridge mdb add dev br0 port dumdum grp 239.0.0.1 permanent vid 1 $ ip link set dev br0 down $ ip link set dev br0 type bridge mcast_vlan_snooping 1 ^^^^ this will enable snooping, but will not update mdb_n_entries because in __br_multicast_enable_port_ctx() we check !netif_running $ bridge mdb flush dev br0 ^^^ this will trigger the warning because it will delete the pg which we added above, which will try to decrease mdb_n_entries Fix the problem by removing the conditional increase and always keep the count up-to-date while the vlan exists. In order to do that we have to first initialize it on port-vlan context creation, and then always increase or decrease the value regardless of mcast options. To keep the current behaviour we have to enforce the mdb limit only if the context is port's or if the port-vlan's mcast snooping is enabled. [1] ------------[ cut here ]------------ n == 0 WARNING: net/bridge/br_multicast.c:718 at br_multicast_port_ngroups_dec_one net/bridge/br_multicast.c:718 [inline], CPU#0: syz.4.4607/22043 WARNING: net/bridge/br_multicast.c:718 at br_multicast_port_ngroups_dec net/bridge/br_multicast.c:771 [inline], CPU#0: syz.4.4607/22043 WARNING: net/bridge/br_multicast.c:718 at br_multicast_del_pg+0x1bbe/0x1e20 net/bridge/br_multicast.c:825, CPU#0: syz.4.4607/22043 Modules linked in: CPU: 0 UID: 0 PID: 22043 Comm: syz.4.4607 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/24/2026 RIP: 0010:br_multicast_port_ngroups_dec_one net/bridge/br_multicast.c:718 [inline] RIP: 0010:br_multicast_port_ngroups_dec net/bridge/br_multicast.c:771 [inline] RIP: 0010:br_multicast_del_pg+0x1bbe/0x1e20 net/bridge/br_multicast.c:825 Code: 41 5f 5d e9 04 7a 48 f7 e8 3f 73 5c f7 90 0f 0b 90 e9 cf fd ff ff e8 31 73 5c f7 90 0f 0b 90 e9 16 fd ff ff e8 23 73 5c f7 90 <0f> 0b 90 e9 60 fd ff ff e8 15 73 5c f7 eb 05 e8 0e 73 5c f7 48 8b RSP: 0018:ffffc9000c207220 EFLAGS: 00010293 RAX: ffffffff8a68042d RBX: ffff88807c6f1800 RCX: ffff888066e90000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff888066e90000 R09: 000000000000000c R10: 000000000000000c R11: 0000000000000000 R12: ffff8880303ef800 R13: dffffc0000000000 R14: ffff888050eb11c4 R15: 1ffff1100a1d6238 FS: 00007fa45921b6c0(0000) GS:ffff8881256f5000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa4591f9ff8 CR3: 0000000081df2000 CR4: 00000000003526f0 Call Trace: <TASK> br_mdb_flush_pgs net/bridge/br_mdb.c:1525 [inline] br_mdb_flush net/bridge/br_mdb.c:1544 [inline] br_mdb_del_bulk+0x5e2/0xb20 net/bridge/br_mdb.c:1561 rtnl_mdb_del+0x48a/0x640 net/core/rtnetlink.c:-1 rtnetlink_rcv_msg+0x77e/0xbe0 net/core/rtnetlink.c:6967 netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa68/0xad0 net/socket.c:2592 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2646 __sys_sendmsg net/socket.c:2678 [inline] __do_sys_sendmsg net/socket.c:2683 [inline] __se_sys_sendmsg net/socket.c:2681 [inline] __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2681 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fa45839aeb9 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fa45921b028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fa458615fa0 RCX: 00007fa45839aeb9 RDX: 0000000000000000 RSI: 00002000000000c0 RDI: 0000000000000004 RBP: 00007fa458408c1f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fa458616038 R14: 00007fa458615fa0 R15: 00007fff0b59fae8 </TASK> Fixes: b57e8d870d52 ("net: bridge: Maintain number of MDB entries in net_bridge_mcast_port") Reported-by: syzbot+d5d1b7343531d17bd3c5@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/aYrWbRp83MQR1ife@debil/T/#t Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://patch.msgid.link/20260213070031.1400003-2-nikolay@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysnet: arcnet: com20020-pci: fix support for 2.5Mbit cardsEthan Nelson-Moore
Commit 8c14f9c70327 ("ARCNET: add com20020 PCI IDs with metadata") converted the com20020-pci driver to use a card info structure instead of a single flag mask in driver_data. However, it failed to take into account that in the original code, driver_data of 0 indicates a card with no special flags, not a card that should not have any card info structure. This introduced a null pointer dereference when cards with no flags were probed. Commit bd6f1fd5d33d ("net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()") then papered over this issue by rejecting cards with no driver_data instead of resolving the problem at its source. Fix the original issue by introducing a new card info structure for 2.5Mbit cards that does not set any flags and using it if no driver_data is present. Fixes: 8c14f9c70327 ("ARCNET: add com20020 PCI IDs with metadata") Fixes: bd6f1fd5d33d ("net: arcnet: com20020: Fix null-ptr-deref in com20020pci_probe()") Cc: stable@vger.kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Link: https://patch.msgid.link/20260213045510.32368-1-enelsonmoore@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysnet/rds: rds_sendmsg should not discard payload_lenAllison Henderson
Commit 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with connection teardown") modifies rds_sendmsg to avoid enqueueing work while a tear down is in progress. However, it also changed the return value of rds_sendmsg to that of rds_send_xmit instead of the payload_len. This means the user may incorrectly receive errno values when it should have simply received a payload of 0 while the peer attempts a reconnections. So this patch corrects the teardown handling code to only use the out error path in that case, thus restoring the original payload_len return value. Fixes: 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with connection teardown") Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Allison Henderson <achender@kernel.org> Link: https://patch.msgid.link/20260213035409.1963391-1-achender@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysxen-netback: reject zero-queue configuration from guestZiyi Guo
A malicious or buggy Xen guest can write "0" to the xenbus key "multi-queue-num-queues". The connect() function in the backend only validates the upper bound (requested_num_queues > xenvif_max_queues) but not zero, allowing requested_num_queues=0 to reach vzalloc(array_size(0, sizeof(struct xenvif_queue))), which triggers WARN_ON_ONCE(!size) in __vmalloc_node_range(). On systems with panic_on_warn=1, this allows a guest-to-host denial of service. The Xen network interface specification requires the queue count to be "greater than zero". Add a zero check to match the validation already present in xen-blkback, which has included this guard since its multi-queue support was added. Fixes: 8d3d53b3e433 ("xen-netback: Add support for multiple queues") Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://patch.msgid.link/20260212224040.86674-1-n7l8m4@u.northwestern.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysnet: usb: catc: enable basic endpoint checkingZiyi Guo
catc_probe() fills three URBs with hardcoded endpoint pipes without verifying the endpoint descriptors: - usb_sndbulkpipe(usbdev, 1) and usb_rcvbulkpipe(usbdev, 1) for TX/RX - usb_rcvintpipe(usbdev, 2) for interrupt status A malformed USB device can present these endpoints with transfer types that differ from what the driver assumes. Add a catc_usb_ep enum for endpoint numbers, replacing magic constants throughout. Add usb_check_bulk_endpoints() and usb_check_int_endpoints() calls after usb_set_interface() to verify endpoint types before use, rejecting devices with mismatched descriptors at probe time. Similar to - commit 90b7f2961798 ("net: usb: rtl8150: enable basic endpoint checking") which fixed the issue in rtl8150. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Simon Horman <horms@kernel.org> Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu> Link: https://patch.msgid.link/20260212214154.3609844-1-n7l8m4@u.northwestern.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 daysovpn: tcp - don't deref NULL sk_socket member after tcp_close()Antonio Quartulli
When deleting a peer in case of keepalive expiration, the peer is removed from the OpenVPN hashtable and is temporary inserted in a "release list" for further processing. This happens in: ovpn_peer_keepalive_work() unlock_ovpn(release_list) This processing includes detaching from the socket being used to talk to this peer, by restoring its original proto and socket ops/callbacks. In case of TCP it may happen that, while the peer is sitting in the release list, userspace decides to close the socket. This will result in a concurrent execution of: tcp_close(sk) __tcp_close(sk) sock_orphan(sk) sk_set_socket(sk, NULL) The last function call will set sk->sk_socket to NULL. When the releasing routine is resumed, ovpn_tcp_socket_detach() will attempt to dereference sk->sk_socket to restore its original ops member. This operation will crash due to sk->sk_socket being NULL. Fix this race condition by testing-and-accessing sk->sk_socket atomically under sk->sk_callback_lock. Link: https://lore.kernel.org/netdev/176996279620.3109699.15382994681575380467@eldamar.lan/ Link: https://github.com/OpenVPN/ovpn-net-next/issues/29 Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Fixes: 11851cbd60ea ("ovpn: implement TCP transport") Link: https://patch.msgid.link/20260212213130.11497-1-antonio@openvpn.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 daysMerge tag 'ovpn-net-20260212' of https://github.com/OpenVPN/ovpn-net-nextPaolo Abeni
Antonio Quartulli says: ==================== This batch includes the following fixes: * set sk_user_data before installing callbacks, to avoid dropping early packets * fix use-after-free in ovpn_net_xmit when accessing shared skbs that got released * fix TX bytes stats by adding-up every positively processed GSO segment ==================== Link: https://patch.msgid.link/20260212210340.11260-1-antonio@openvpn.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 daysMerge branch 'eth-fbnic-fix-and-improve-header-data-split-configuration'Paolo Abeni
Bobby Eshleman says: ==================== eth: fbnic: fix and improve header/data split configuration This series fixes TCP HDS configuration in the fbnic driver and adds a devmem selftest for the intended behavior. The issues fixed include setting the correct CSR for EN_HDR_SPLIT, adjusting the hds threshold clamp to avoid long headers overflowing into the payload page, and configuring the device to use L4/L3/L2 header boundaries when present by programming the DMA hint bit unconditionally for all steering rule types. Prior to these fixes, small payloads fail devmem.check_rx_hds(). Testing: IFC0=enp1s0 IFC1=eth0 export REMOTE_TYPE=netns export REMOTE_ARGS=ns-remote export NETIF=$IFC0 export LOCAL_V4=192.0.3.1 export REMOTE_V4=192.0.3.2 sysctl -w net.ipv6.conf.$IFC0.keep_addr_on_down=1 ./tools/testing/selftests/drivers/net/hw/devmem.py TAP version 13 1..4 ok 1 devmem.check_rx ok 2 devmem.check_tx ok 3 devmem.check_tx_chunks ok 4 devmem.check_rx_hds To: Alexander Duyck <alexanderduyck@fb.com> To: Jakub Kicinski <kuba@kernel.org> To: kernel-team@meta.com To: Andrew Lunn <andrew+netdev@lunn.ch> To: David S. Miller <davem@davemloft.net> To: Eric Dumazet <edumazet@google.com> To: Paolo Abeni <pabeni@redhat.com> To: Mohsin Bashir <mohsin.bashr@gmail.com> To: Shuah Khan <shuah@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> ==================== Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-0-55d050e6f606@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 daysselftests: drv-net: add HDS payload sweep test for devmem TCPBobby Eshleman
Add check_rx_hds test that verifies header/data split works across payload sizes. The test sweeps payload sizes from 1 byte to 8KB, if any data propagates up to userspace as SCM_DEVMEM_LINEAR, then the test fails. This shows that regardless of payload size, ncdevmem's configuration of hds-thresh to 0 is respected. Add -L (--fail-on-linear) flag to ncdevmem that causes the receiver to fail if any SCM_DEVMEM_LINEAR cmsg is received. Use socat option for fixed block sizing and tcp nodelay to disable nagle's algo to avoid buffering. Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-4-55d050e6f606@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 dayseth: fbnic: set DMA_HINT_L4 for all flowsBobby Eshleman
fbnic always advertises ETHTOOL_TCP_DATA_SPLIT_ENABLED via ethtool .get_ringparam. To enable proper splitting for all flow types, even for IP/Ethernet flows, this patch sets DMA_HINT_L4 unconditionally for all RSS and NFC flow steering rules. According to the spec, L4 falls back to L3 if no valid L4 is found, and L3 falls back to L2 if no L3 is found. This makes sure that the correct header boundary is used regardless of traffic type. This is important for zero-copy use cases where we must ensure that all ZC packets are split correctly. Fixes: 2b30fc01a6c7 ("eth: fbnic: Add support for HDS configuration") Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-3-55d050e6f606@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 dayseth: fbnic: increase FBNIC_HDR_BYTES_MIN from 128 to 256 bytesBobby Eshleman
Increase FBNIC_HDR_BYTES_MIN from 128 to 256 bytes. The previous minimum was too small to guarantee that very long L2+L3+L4 headers always fit within the header buffer. When EN_HDR_SPLIT is disabled and a packet exceeds MAX_HEADER_BYTES, splitting occurs at that byte offset instead of the header boundary, resulting in some of the header landing in the payload page. The increased minimum ensures headers always fit with the MAX_HEADER_BYTES cut off and land in the header page. Fixes: 2b30fc01a6c7 ("eth: fbnic: Add support for HDS configuration") Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Acked-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-2-55d050e6f606@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 dayseth: fbnic: set FBNIC_QUEUE_RDE_CTL0_EN_HDR_SPLIT on RDE_CTL0Bobby Eshleman
Fix EN_HDR_SPLIT configuration by writing the field to RDE_CTL0 instead of RDE_CTL1. Because drop mode configuration and header splitting enablement both use RDE_CTL0, we consolidate these configurations into the single function fbnic_config_drop_mode. Fixes: 2b30fc01a6c7 ("eth: fbnic: Add support for HDS configuration") Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Acked-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-1-55d050e6f606@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
8 daysfbnic: close fw_log race between users and teardownChengfeng Ye
Fixes a theoretical race on fw_log between the teardown path and fw_log write functions. fw_log is written inside fbnic_fw_log_write() and can be reached from the mailbox handler fbnic_fw_msix_intr(), but fw_log is freed before IRQ/MBX teardown during cleanup, resulting in a potential data race of dereferencing a freed/null variable. Possible Interleaving Scenario: CPU0: fbnic_fw_msix_intr() // Entry fbnic_fw_log_write() if (fbnic_fw_log_ready()) // true ... preempt ... CPU1: fbnic_remove() // Entry fbnic_fw_log_free() vfree(log->data_start); log->data_start = NULL; CPU0: continues, walks log->entries or writes to log->data_start The initialization also has an incorrect order problem, as the fw_log is currently allocated after MBX setup during initialization. Fix the problems by adjusting the synchronization order to put initialization in place before the mailbox is enabled, and not cleared until after the mailbox has been disabled. Fixes: ecc53b1b46c89 ("eth: fbnic: Enable firmware logging") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Link: https://patch.msgid.link/20260211191329.530886-1-dg573847474@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysMerge branch 'vsock-fix-child-netns-mode-initialization-and-restriction'Jakub Kicinski
Stefano Garzarella says: ==================== vsock: fix child netns mode initialization and restriction This series fixes two issues in the vsock network namespace support recently introduced by commit eafb64f40ca4 ("vsock: add netns to vsock core"). Patch 1 fixes `child_ns_mode` being always hardcoded to "global" for new namespaces, breaking propagation of the "local" mode through nested namespaces. Patch 2 prevents a "local" namespace from switching `child_ns_mode` to "global", which would allow nested namespaces to escape vsock isolation and access global CIDs. ==================== Link: https://patch.msgid.link/20260212205916.97533-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysvsock: prevent child netns mode switch from local to globalStefano Garzarella
A "local" namespace can change its `child_ns_mode` sysctl to "global", allowing nested namespaces to access global CIDs. This can be exploited by an unprivileged user who gained CAP_NET_ADMIN through a user namespace. Prevent this by rejecting writes that attempt to set `child_ns_mode` to "global" when the current namespace's mode is "local". Fixes: eafb64f40ca4 ("vsock: add netns to vsock core") Cc: bobbyeshleman@meta.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260212205916.97533-3-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysvsock: fix child netns mode initializationStefano Garzarella
When a new network namespace is created, vsock_net_init() correctly initializes the namespace's mode by reading the parent's `child_ns_mode` via vsock_net_child_mode(). However, the `child_ns_mode` of the new namespace was always hardcoded to VSOCK_NET_MODE_GLOBAL, regardless of its own mode. This means that if a parent namespace has `child_ns_mode` set to "local", the child namespace correctly gets mode "local", but its `child_ns_mode` is reset to "global". As a result, further nested namespaces will incorrectly get mode "global" instead of inheriting "local", breaking the expected propagation of the mode through nested namespaces. Fix this by initializing `child_ns_mode` to the namespace's own mode, so the setting propagates correctly through all levels of nesting. Fixes: eafb64f40ca4 ("vsock: add netns to vsock core") Cc: bobbyeshleman@meta.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260212205916.97533-2-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysnet: sparx5/lan969x: fix PTP clock max_adj valueDaniel Machon
The max_adj field in ptp_clock_info tells userspace how much the PHC clock frequency can be adjusted. ptp4l reads this and will never request a correction larger than max_adj. On both sparx5 and lan969x the clock offset may never converge because the servo needs a frequency correction larger than the current max_adj of 200000 (200 ppm) allows. The servo rails at the max and the offset stays in the tens of microseconds. The hardware has no inherent max adjustment limit; frequency correction is done by writing a 64-bit clock period increment to CLK_PER_CFG, and the register has plenty of range. The 200000 value was just an overly conservative software limit. The max_adj is shared between sparx5 and lan969x, and the increased value is safe for both. Fix this by increasing max_adj to 10000000 (10000 ppm), giving the servo sufficient headroom. Fixes: 0933bd04047c ("net: sparx5: Add support for ptp clocks") Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260212-sparx5-ptp-max-adj-v2-v1-1-06b200e50ce3@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysipv6: Fix out-of-bound access in fib6_add_rt2node().Kuniyuki Iwashima
syzbot reported out-of-bound read in fib6_add_rt2node(). [0] When IPv6 route is created with RTA_NH_ID, struct fib6_info does not have the trailing struct fib6_nh. The cited commit started to check !iter->fib6_nh->fib_nh_gw_family to ensure that rt6_qualify_for_ecmp() will return false for iter. If iter->nh is not NULL, rt6_qualify_for_ecmp() returns false anyway. Let's check iter->nh before reading iter->fib6_nh and avoid OOB read. [0]: BUG: KASAN: slab-out-of-bounds in fib6_add_rt2node+0x349c/0x3500 net/ipv6/ip6_fib.c:1142 Read of size 1 at addr ffff8880384ba6de by task syz.0.18/5500 CPU: 0 UID: 0 PID: 5500 Comm: syz.0.18 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xba/0x230 mm/kasan/report.c:482 kasan_report+0x117/0x150 mm/kasan/report.c:595 fib6_add_rt2node+0x349c/0x3500 net/ipv6/ip6_fib.c:1142 fib6_add_rt2node_nh net/ipv6/ip6_fib.c:1363 [inline] fib6_add+0x910/0x18c0 net/ipv6/ip6_fib.c:1531 __ip6_ins_rt net/ipv6/route.c:1351 [inline] ip6_route_add+0xde/0x1b0 net/ipv6/route.c:3957 inet6_rtm_newroute+0x268/0x19e0 net/ipv6/route.c:5660 rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6958 netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa68/0xad0 net/socket.c:2592 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2646 __sys_sendmsg net/socket.c:2678 [inline] __do_sys_sendmsg net/socket.c:2683 [inline] __se_sys_sendmsg net/socket.c:2681 [inline] __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2681 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f9316b9aeb9 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd8809b678 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f9316e15fa0 RCX: 00007f9316b9aeb9 RDX: 0000000000000000 RSI: 0000200000004380 RDI: 0000000000000003 RBP: 00007f9316c08c1f R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f9316e15fac R14: 00007f9316e15fa0 R15: 00007f9316e15fa0 </TASK> Allocated by task 5499: kasan_save_stack mm/kasan/common.c:57 [inline] kasan_save_track+0x3e/0x80 mm/kasan/common.c:78 poison_kmalloc_redzone mm/kasan/common.c:398 [inline] __kasan_kmalloc+0x93/0xb0 mm/kasan/common.c:415 kasan_kmalloc include/linux/kasan.h:263 [inline] __do_kmalloc_node mm/slub.c:5657 [inline] __kmalloc_noprof+0x40c/0x7e0 mm/slub.c:5669 kmalloc_noprof include/linux/slab.h:961 [inline] kzalloc_noprof include/linux/slab.h:1094 [inline] fib6_info_alloc+0x30/0xf0 net/ipv6/ip6_fib.c:155 ip6_route_info_create+0x142/0x860 net/ipv6/route.c:3820 ip6_route_add+0x49/0x1b0 net/ipv6/route.c:3949 inet6_rtm_newroute+0x268/0x19e0 net/ipv6/route.c:5660 rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6958 netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa68/0xad0 net/socket.c:2592 ___sys_sendmsg+0x2a5/0x360 net/socket.c:2646 __sys_sendmsg net/socket.c:2678 [inline] __do_sys_sendmsg net/socket.c:2683 [inline] __se_sys_sendmsg net/socket.c:2681 [inline] __x64_sys_sendmsg+0x1bd/0x2a0 net/socket.c:2681 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: bbf4a17ad9ff ("ipv6: Fix ECMP sibling count mismatch when clearing RTF_ADDRCONF") Reported-by: syzbot+707d6a5da1ab9e0c6f9d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/698cbfba.050a0220.2eeac1.009d.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Shigeru Yoshida <syoshida@redhat.com> Link: https://patch.msgid.link/20260211175133.3657034-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()Qanux
On the receive path, __ioam6_fill_trace_data() uses trace->nodelen to decide how much data to write for each node. It trusts this field as-is from the incoming packet, with no consistency check against trace->type (the 24-bit field that tells which data items are present). A crafted packet can set nodelen=0 while setting type bits 0-21, causing the function to write ~100 bytes past the allocated region (into skb_shared_info), which corrupts adjacent heap memory and leads to a kernel panic. Add a shared helper ioam6_trace_compute_nodelen() in ioam6.c to derive the expected nodelen from the type field, and use it: - in ioam6_iptunnel.c (send path, existing validation) to replace the open-coded computation; - in exthdrs.c (receive path, ipv6_hop_ioam) to drop packets whose nodelen is inconsistent with the type field, before any data is written. Per RFC 9197, bits 12-21 are each short (4-octet) fields, so they are included in IOAM6_MASK_SHORT_FIELDS (changed from 0xff100000 to 0xff1ffc00). Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Cc: stable@vger.kernel.org Signed-off-by: Junxi Qian <qjx1298677004@gmail.com> Reviewed-by: Justin Iurman <justin.iurman@gmail.com> Link: https://patch.msgid.link/20260211040412.86195-1-qjx1298677004@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysselftests: netconsole: Increase port listening timeoutPin-yen Lin
wait_for_port() can wait up to 2 seconds with the sleep and the polling in wait_local_port_listen() combined. So, in netcons_basic.sh, the socat process could die before the test writes to the netconsole. Increase the timeout to 3 seconds to make netcons_basic.sh pass consistently. Fixes: 3dc6c76391cb ("selftests: net: Add IPv6 support to netconsole basic tests") Signed-off-by: Pin-yen Lin <treapking@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260210005939.3230550-1-treapking@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysMerge branch 'net-mscc-ocelot-fix-missing-lock-in-ocelot_port_xmit'Jakub Kicinski
Ziyi Guo says: ==================== net: mscc: ocelot: fix missing lock in ocelot_port_xmit() ocelot_port_xmit() calls ocelot_can_inject() and ocelot_port_inject_frame() without holding the injection group lock. Both functions contain lockdep_assert_held() for the injection lock, and the correct caller felix_port_deferred_xmit() properly acquires the lock using ocelot_lock_inj_grp() before calling these functions. this v3 splits the fix into a 3-patch series to separate refactoring from the behavioral change: 1/3: Extract the PTP timestamp handling into an ocelot_xmit_timestamp() helper so the logic isn't duplicated when the function is split. 2/3: Split ocelot_port_xmit() into ocelot_port_xmit_fdma() and ocelot_port_xmit_inj(), keeping the FDMA and register injection code paths fully separate. 3/3: Add ocelot_lock_inj_grp()/ocelot_unlock_inj_grp() in ocelot_port_xmit_inj() to fix the missing lock protection. Patches 1-2 are pure refactors with no behavioral change. Patch 3 is the actual bug fix. ==================== Link: https://patch.msgid.link/20260208225602.1339325-1-n7l8m4@u.northwestern.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: mscc: ocelot: add missing lock protection in ocelot_port_xmit_inj()Ziyi Guo
ocelot_port_xmit_inj() calls ocelot_can_inject() and ocelot_port_inject_frame() without holding the injection group lock. Both functions contain lockdep_assert_held() for the injection lock, and the correct caller felix_port_deferred_xmit() properly acquires the lock using ocelot_lock_inj_grp() before calling these functions. Add ocelot_lock_inj_grp()/ocelot_unlock_inj_grp() around the register injection path to fix the missing lock protection. The FDMA path is not affected as it uses its own locking mechanism. Fixes: c5e12ac3beb0 ("net: mscc: ocelot: serialize access to the injection/extraction groups") Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260208225602.1339325-4-n7l8m4@u.northwestern.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: mscc: ocelot: split xmit into FDMA and register injection pathsZiyi Guo
Split ocelot_port_xmit() into two separate functions: - ocelot_port_xmit_fdma(): handles the FDMA injection path - ocelot_port_xmit_inj(): handles the register-based injection path The top-level ocelot_port_xmit() now dispatches to the appropriate function based on the ocelot_fdma_enabled static key. This is a pure refactor with no behavioral change. Separating the two code paths makes each one simpler and prepares for adding proper locking to the register injection path without affecting the FDMA path. Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260208225602.1339325-3-n7l8m4@u.northwestern.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: mscc: ocelot: extract ocelot_xmit_timestamp() helperZiyi Guo
Extract the PTP timestamp handling logic from ocelot_port_xmit() into a separate ocelot_xmit_timestamp() helper function. This is a pure refactor with no behavioral change. The helper returns false if the skb was consumed (freed) due to a timestamp request failure, and true if the caller should continue with frame injection. The rew_op value is returned via pointer. This prepares for splitting ocelot_port_xmit() into separate FDMA and register injection paths in a subsequent patch. Signed-off-by: Ziyi Guo <n7l8m4@u.northwestern.edu> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260208225602.1339325-2-n7l8m4@u.northwestern.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: sparx5/lan969x: fix DWRR cost max to match hardware register widthDaniel Machon
DWRR (Deficit Weighted Round Robin) scheduling distributes bandwidth across traffic classes based on per-queue cost values, where lower cost means higher bandwidth share. The SPX5_DWRR_COST_MAX constant is 63 (6 bits) but the hardware register field HSCH_DWRR_ENTRY_DWRR_COST is GENMASK(24, 20), only 5 bits wide (max 31). This causes sparx5_weight_to_hw_cost() to compute cost values that silently overflow via FIELD_PREP, resulting in incorrect scheduling weights. Set SPX5_DWRR_COST_MAX to 31 to match the hardware register width. Fixes: 211225428d65 ("net: microchip: sparx5: add support for offloading ets qdisc") Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260210-sparx5-fix-dwrr-cost-max-v1-1-58fbdbc25652@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysatm: fore200e: fix use-after-free in tasklets during device removalDuoming Zhou
When the PCA-200E or SBA-200E adapter is being detached, the fore200e is deallocated. However, the tx_tasklet or rx_tasklet may still be running or pending, leading to use-after-free bug when the already freed fore200e is accessed again in fore200e_tx_tasklet() or fore200e_rx_tasklet(). One of the race conditions can occur as follows: CPU 0 (cleanup) | CPU 1 (tasklet) fore200e_pca_remove_one() | fore200e_interrupt() fore200e_shutdown() | tasklet_schedule() kfree(fore200e) | fore200e_tx_tasklet() | fore200e-> // UAF Fix this by ensuring tx_tasklet or rx_tasklet is properly canceled before the fore200e is released. Add tasklet_kill() in fore200e_shutdown() to synchronize with any pending or running tasklets. Moreover, since fore200e_reset() could prevent further interrupts or data transfers, the tasklet_kill() should be placed after fore200e_reset() to prevent the tasklet from being rescheduled in fore200e_interrupt(). Finally, it only needs to do tasklet_kill() when the fore200e state is greater than or equal to FORE200E_STATE_IRQ, since tasklets are uninitialized in earlier states. In a word, the tasklet_kill() should be placed in the FORE200E_STATE_IRQ branch within the switch...case structure. This bug was identified through static analysis. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@kernel.org Suggested-by: Jijie Shao <shaojijie@huawei.com> Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260210094537.9767-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: stmmac: fix oops when split header is enabledJie Zhang
For GMAC4, when split header is enabled, in some rare cases, the hardware does not fill buf2 of the first descriptor with payload. Thus we cannot assume buf2 is always fully filled if it is not the last descriptor. Otherwise, the length of buf2 of the second descriptor will be calculated wrong and cause an oops: Unable to handle kernel paging request at virtual address ffff00019246bfc0 ... x2 : 0000000000000040 x1 : ffff00019246bfc0 x0 : ffff00009246c000 Call trace: dcache_inval_poc+0x28/0x58 (P) dma_direct_sync_single_for_cpu+0x38/0x6c __dma_sync_single_for_cpu+0x34/0x6c stmmac_napi_poll_rx+0x8f0/0xb60 __napi_poll.constprop.0+0x30/0x144 net_rx_action+0x160/0x274 handle_softirqs+0x1b8/0x1fc ... To fix this, the PL bit-field in RDES3 register is used for all descriptors, whether it is the last descriptor or not. Fixes: ec222003bd94 ("net: stmmac: Prepare to add Split Header support") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jie Zhang <jie.zhang@analog.com> Link: https://patch.msgid.link/20260209225037.589130-1-jie.zhang@analog.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: intel: fix PCI device ID conflict between i40e and ipw2200Ethan Nelson-Moore
The ID 8086:104f is matched by both i40e and ipw2200. The same device ID should not be in more than one driver, because in that case, which driver is used is unpredictable. Fix this by taking advantage of the fact that i40e devices use PCI_CLASS_NETWORK_ETHERNET and ipw2200 devices use PCI_CLASS_NETWORK_OTHER to differentiate the devices. Fixes: 2e45d3f4677a ("i40e: Add support for X710 B/P & SFP+ cards") Cc: stable@vger.kernel.org Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/20260210021235.16315-1-enelsonmoore@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysselftests: drv-net: limit RPS test CPUs to supported rangeGal Pressman
The _get_unused_cpus() function can return CPU numbers >= 16, which exceeds RPS_MAX_CPUS in toeplitz.c. When this happens, the test fails with a cryptic message: # Exception| Traceback (most recent call last): # Exception| File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/ksft.py", line 319, in ksft_run # Exception| func(*args) # Exception| File "/tmp/cur/linux/tools/testing/selftests/drivers/net/hw/toeplitz.py", line 189, in test # Exception| with bkg(" ".join(rx_cmd), ksft_ready=True, exit_wait=True) as rx_proc: # Exception| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # Exception| File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/utils.py", line 124, in __init__ # Exception| super().__init__(comm, background=True, # Exception| File "/tmp/cur/linux/tools/testing/selftests/net/lib/py/utils.py", line 77, in __init__ # Exception| raise Exception("Did not receive ready message") # Exception| Exception: Did not receive ready message Rename _get_unused_cpus() to _get_unused_rps_cpus() and cap the CPU search range to RPS_MAX_CPUS. Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260210093110.1935149-1-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysselftests: net: lib: Fix jq parsing errorYue Haibing
The testcase failed as below: $./vlan_bridge_binding.sh ... + adf_ip_link_set_up d1 + local name=d1 + shift + ip_link_is_up d1 + ip_link_has_flag d1 UP + local name=d1 + shift + local flag=UP + shift ++ ip -j link show d1 ++ jq --arg flag UP 'any(.[].flags.[]; . == $flag)' jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START (Unix shell quoting issues?) at <top-level>, line 1: any(.[].flags.[]; . == $flag) jq: 1 compile error Remove the extra dot (.) after flags array to fix this. Fixes: 4baa1d3a5080 ("selftests: net: lib: Add ip_link_has_flag()") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20260211022146.190948-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysbng_en: Remove duplicate includeChen Ni
Remove duplicate inclusion of <net/netdev_queues.h>. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Link: https://patch.msgid.link/20260211032021.2719742-1-nichen@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysselftests: mlxsw: tc_restrictions: Fix test failure with new iproute2Ido Schimmel
As explained in [1], iproute2 started rejecting tc-police burst sizes that result in an overflow. This can happen when the burst size is high enough and the rate is low enough. A couple of test cases specify such configurations, resulting in iproute2 errors and test failure. Fix by reducing the burst size so that the test will pass with both new and old iproute2 versions. [1] https://lore.kernel.org/netdev/20250916215731.3431465-1-jay.vosburgh@canonical.com/ Fixes: cb12d1763267 ("selftests: mlxsw: tc_restrictions: Test tc-police restrictions") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/88b00c6e85188aa6a065dc240206119b328c46e1.1770643998.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: mctp: ensure our nlmsg responses are initialisedJeremy Kerr
Syed Faraz Abrar (@farazsth98) from Zellic, and Pumpkin (@u1f383) from DEVCORE Research Team working with Trend Micro Zero Day Initiative report that a RTM_GETNEIGH will return uninitalised data in the pad bytes of the ndmsg data. Ensure we're initialising the netlink data to zero, in the link, addr and neigh response messages. Fixes: 831119f88781 ("mctp: Add neighbour netlink interface") Fixes: 06d2f4c583a7 ("mctp: Add netlink route management") Fixes: 583be982d934 ("mctp: Add device handling and netlink interface") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260209-dev-mctp-nlmsg-v1-1-f1e30c346a43@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysmyri10ge: replace comma with semicolonsChen Ni
Commit fd24173439c0 ("myri10ge: avoid uninitialized variable use") added commas instead of semicolons Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260212055028.3248491-1-nichen@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysMerge branch 'net-phy_port-sfp-modules-representation-and-phy_port-listing'Jakub Kicinski
Maxime Chevallier says: ==================== net: phy_port: SFP modules representation and phy_port listing (part) ==================== Applying just the initial cleanup + fixes from the series. Link: https://patch.msgid.link/20260205092317.755906-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: phy: phy_port: Correctly recompute the port's linkmodesMaxime Chevallier
a PHY-driven phy_port contains a 'supported' field containing the linkmodes available on this port. This is populated based on : - The PHY's reported features - The DT representation of the connector - The PHY's attach_mdi() callback As these different attrbution methods work in conjunction, the helper phy_port_update_supported() recomputes the final 'supported' value based on the populated mediums, linkmodes and pairs. However this recompute wasn't correctly implemented, and added more modes than necessary by or'ing the medium-specific modes to the existing support. Let's fix this and properly filter the modes. Fixes: 589e934d2735 ("net: phy: Introduce PHY ports representation") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Link: https://patch.msgid.link/20260205092317.755906-4-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: phy: phy_port: Cleanup the of-parsing logic for phy_portMaxime Chevallier
We don't need to maintain a mediums bitfield, let's drop it and drop a bogus check for empty mediums, as we already check it above. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Link: https://patch.msgid.link/20260205092317.755906-3-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: phy: initialize the port support based on the PHY's for OF portsMaxime Chevallier
With the phy_port infrastructure came an ethernet-connector binding, allowing to represent the MDI of a PHY in devicetree. This allows specifying the mediums and pairs of a port. Let's initialize the port's supported list based on what the PHY reports, so that we can then filter it with what the connector allows using. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Link: https://patch.msgid.link/20260205092317.755906-2-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysovpn: fix VPN TX bytes countingRalf Lici
In ovpn_net_xmit, after GSO segmentation and segment processing, the first segment on the list is used to increment VPN TX statistics, which fails to account for any subsequent segments in the chain. Fix this by accumulating the length of every segment that successfully passes skb_share_check into a tx_bytes variable. This ensures the peer statistics accurately reflect the total data volume sent, regardless of whether the original packet was segmented. Fixes: 04ca14955f9a ("ovpn: store tunnel and transport statistics") Signed-off-by: Ralf Lici <ralf@mandelbit.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
9 daysovpn: fix possible use-after-free in ovpn_net_xmitRalf Lici
When building the skb_list in ovpn_net_xmit, skb_share_check will free the original skb if it is shared. The current implementation continues to use the stale skb pointer for subsequent operations: - peer lookup, - skb_dst_drop (even though all segments produced by skb_gso_segment will have a dst attached), - ovpn_peer_stats_increment_tx. Fix this by moving the peer lookup and skb_dst_drop before segmentation so that the original skb is still valid when used. Return early if all segments fail skb_share_check and the list ends up empty. Also switch ovpn_peer_stats_increment_tx to use skb_list.next; the next patch fixes the stats logic. Fixes: 08857b5ec5d9 ("ovpn: implement basic TX path (UDP)") Signed-off-by: Ralf Lici <ralf@mandelbit.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
9 daysovpn: set sk_user_data before overriding callbacksRalf Lici
During initialization, we override socket callbacks and set sk_user_data to an ovpn_socket instance. Currently, these two operations are decoupled: callbacks are overridden before sk_user_data is set. While existing callbacks perform safety checks for NULL or non-ovpn sk_user_data, this condition causes a "half-formed" state where valid packets arriving during attachment trigger error logs (e.g., "invoked on non ovpn socket"). Set sk_user_data before overriding the callbacks so that it can be accessed safely from them. Since we already check that the socket has no sk_user_data before setting it, this remains safe even if an interrupt accesses the socket after sk_user_data is set but before the callbacks are overridden. This also requires initializing all protocol-specific fields (such as tcp_tx_work and peer links) before calling ovpn_socket_attach, ensuring the ovpn_socket is fully formed before it becomes visible to any callback. Fixes: f6226ae7a0cd ("ovpn: introduce the ovpn_socket object") Signed-off-by: Ralf Lici <ralf@mandelbit.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Antonio Quartulli <antonio@openvpn.net>
10 daysMerge tag 'net-next-7.0' of ↵ipvs-next/mainipvs-next/HEADLinus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core & protocols: - A significant effort all around the stack to guide the compiler to make the right choice when inlining code, to avoid unneeded calls for small helper and stack canary overhead in the fast-path. This generates better and faster code with very small or no text size increases, as in many cases the call generated more code than the actual inlined helper. - Extend AccECN implementation so that is now functionally complete, also allow the user-space enabling it on a per network namespace basis. - Add support for memory providers with large (above 4K) rx buffer. Paired with hw-gro, larger rx buffer sizes reduce the number of buffers traversing the stack, dincreasing single stream CPU usage by up to ~30%. - Do not add HBH header to Big TCP GSO packets. This simplifies the RX path, the TX path and the NIC drivers, and is possible because user-space taps can now interpret correctly such packets without the HBH hint. - Allow IPv6 routes to be configured with a gateway address that is resolved out of a different interface than the one specified, aligning IPv6 to IPv4 behavior. - Multi-queue aware sch_cake. This makes it possible to scale the rate shaper of sch_cake across multiple CPUs, while still enforcing a single global rate on the interface. - Add support for the nbcon (new buffer console) infrastructure to netconsole, enabling lock-free, priority-based console operations that are safer in crash scenarios. - Improve the TCP ipv6 output path to cache the flow information, saving cpu cycles, reducing cache line misses and stack use. - Improve netfilter packet tracker to resolve clashes for most protocols, avoiding unneeded drops on rare occasions. - Add IP6IP6 tunneling acceleration to the flowtable infrastructure. - Reduce tcp socket size by one cache line. - Notify neighbour changes atomically, avoiding inconsistencies between the notification sequence and the actual states sequence. - Add vsock namespace support, allowing complete isolation of vsocks across different network namespaces. - Improve xsk generic performances with cache-alignment-oriented optimizations. - Support netconsole automatic target recovery, allowing netconsole to reestablish targets when underlying low-level interface comes back online. Driver API: - Support for switching the working mode (automatic vs manual) of a DPLL device via netlink. - Introduce PHY ports representation to expose multiple front-facing media ports over a single MAC. - Introduce "rx-polarity" and "tx-polarity" device tree properties, to generalize polarity inversion requirements for differential signaling. - Add helper to create, prepare and enable managed clocks. Device drivers: - Add Huawei hinic3 PF etherner driver. - Add DWMAC glue driver for Motorcomm YT6801 PCIe ethernet controller. - Add ethernet driver for MaxLinear MxL862xx switches - Remove parallel-port Ethernet driver. - Convert existing driver timestamp configuration reporting to hwtstamp_get and remove legacy ioctl(). - Convert existing drivers to .get_rx_ring_count(), simplifing the RX ring count retrieval. Also remove the legacy fallback path. - Ethernet high-speed NICs: - Broadcom (bnxt, bng): - bnxt: add FW interface update to support FEC stats histogram and NVRAM defragmentation - bng: add TSO and H/W GRO support - nVidia/Mellanox (mlx5): - improve latency of channel restart operations, reducing the used H/W resources - add TSO support for UDP over GRE over VLAN - add flow counters support for hardware steering (HWS) rules - use a static memory area to store headers for H/W GRO, leading to 12% RX tput improvement - Intel (100G, ice, idpf): - ice: reorganizes layout of Tx and Rx rings for cacheline locality and utilizes __cacheline_group* macros on the new layouts - ice: introduces Synchronous Ethernet (SyncE) support - Meta (fbnic): - adds debugfs for firmware mailbox and tx/rx rings vectors - Ethernet virtual: - geneve: introduce GRO/GSO support for double UDP encapsulation - Ethernet NICs consumer, and embedded: - Synopsys (stmmac): - some code refactoring and cleanups - RealTek (r8169): - add support for RTL8127ATF (10G Fiber SFP) - add dash and LTR support - Airoha: - AN8811HB 2.5 Gbps phy support - Freescale (fec): - add XDP zero-copy support - Thunderbolt: - add get link setting support to allow bonding - Renesas: - add support for RZ/G3L GBETH SoC - Ethernet switches: - Maxlinear: - support R(G)MII slow rate configuration - add support for Intel GSW150 - Motorcomm (yt921x): - add DCB/QoS support - TI: - icssm-prueth: support bridging (STP/RSTP) via the switchdev framework - Ethernet PHYs: - Realtek: - enable SGMII and 2500Base-X in-band auto-negotiation - simplify and reunify C22/C45 drivers - Micrel: convert bindings to DT schema - CAN: - move skb headroom content into skb extensions, making CAN metadata access more robust - CAN drivers: - rcar_canfd: - add support for FD-only mode - add support for the RZ/T2H SoC - sja1000: cleanup the CAN state handling - WiFi: - implement EPPKE/802.1X over auth frames support - split up drop reasons better, removing generic RX_DROP - additional FTM capabilities: 6 GHz support, supported number of spatial streams and supported number of LTF repetitions - better mac80211 iterators to enumerate resources - initial UHR (Wi-Fi 8) support for cfg80211/mac80211 - WiFi drivers: - Qualcomm/Atheros: - ath11k: support for Channel Frequency Response measurement - ath12k: a significant driver refactor to support multi-wiphy devices and and pave the way for future device support in the same driver (rather than splitting to ath13k) - ath12k: support for the QCC2072 chipset - Intel: - iwlwifi: partial Neighbor Awareness Networking (NAN) support - iwlwifi: initial support for U-NII-9 and IEEE 802.11bn - RealTek (rtw89): - preparations for RTL8922DE support - Bluetooth: - implement setsockopt(BT_PHY) to set the connection packet type/PHY - set link_policy on incoming ACL connections - Bluetooth drivers: - btusb: add support for MediaTek7920, Realtek RTL8761BU and 8851BE - btqca: add WCN6855 firmware priority selection feature" * tag 'net-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1254 commits) bnge/bng_re: Add a new HSI net: macb: Fix tx/rx malfunction after phy link down and up af_unix: Fix memleak of newsk in unix_stream_connect(). net: ti: icssg-prueth: Add optional dependency on HSR net: dsa: add basic initial driver for MxL862xx switches net: mdio: add unlocked mdiodev C45 bus accessors net: dsa: add tag format for MxL862xx switches dt-bindings: net: dsa: add MaxLinear MxL862xx selftests: drivers: net: hw: Modify toeplitz.c to poll for packets octeontx2-pf: Unregister devlink on probe failure net: renesas: rswitch: fix forwarding offload statemachine ionic: Rate limit unknown xcvr type messages tcp: inet6_csk_xmit() optimization tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock() tcp: populate inet->cork.fl.u.ip6 in tcp_v6_connect() ipv6: inet6_csk_xmit() and inet6_csk_update_pmtu() use inet->cork.fl.u.ip6 ipv6: use inet->cork.fl.u.ip6 and np->final in ip6_datagram_dst_update() ipv6: use np->final in inet6_sk_rebuild_header() ipv6: add daddr/final storage in struct ipv6_pinfo net: stmmac: qcom-ethqos: fix qcom_ethqos_serdes_powerup() ...