summaryrefslogtreecommitdiff
path: root/Documentation/networking
AgeCommit message (Collapse)Author
2025-03-21net/mlx5e: Expose port reset cycle recovery counter via ethtoolYael Chemla
Display recovery event of PPCNT recovery counters group. Counts (per link) the number of total successful recovery events of any recovery types during port reset cycle. Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/1742112876-2890-5-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21docs: netconsole: document release featureBreno Leitao
Add documentation explaining the kernel release auto-population feature in netconsole. This feature appends kernel version information to the userdata dictionary in every message sent when enabled via the `release_enabled` file in the configfs hierarchy. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-netcons_release-v1-6-07979c4b86af@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21net: remove sb1000 cable modem driverArnd Bergmann
This one is hilariously outdated, it provided a faster downlink over TV cable for users of analog modems in the 1990s, through an ISA card. The web page for the userspace tools has been broken for 25 years, and the driver has only ever seen mechanical updates. Link: http://web.archive.org/web/20000611165545/http://home.adelphia.net:80/~siglercm/sb1000.html Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250312085236.2531870-1-arnd@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20mptcp: sysctl: add available_path_managersGeliang Tang
Similarly to net.mptcp.available_schedulers, this patch adds a new one net.mptcp.available_path_managers to list the available path managers. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-11-f4e4a88efc50@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20mptcp: sysctl: set path manager by nameGeliang Tang
Similar to net.mptcp.scheduler, a new net.mptcp.path_manager sysctl knob is added to determine which path manager will be used by each newly created MPTCP socket by setting the name of it. Dealing with an explicit name is easier than with a number, especially when more PMs will be introduced. This sysctl knob makes the old one "pm_type" deprecated. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-8-f4e4a88efc50@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19netconsole: allow selection of egress interface via MAC addressUday Shankar
Currently, netconsole has two methods of configuration - module parameter and configfs. The former interface allows for netconsole activation earlier during boot (by specifying the module parameter on the kernel command line), so it is preferred for debugging issues which arise before userspace is up/the configfs interface can be used. The module parameter syntax requires specifying the egress interface name. This requirement makes it hard to use for a couple reasons: - The egress interface name can be hard or impossible to predict. For example, installing a new network card in a system can change the interface names assigned by the kernel. - When constructing the module parameter, one may have trouble determining the original (kernel-assigned) name of the interface (which is the name that should be given to netconsole) if some stable interface naming scheme is in effect. A human can usually look at kernel logs to determine the original name, but this is very painful if automation is constructing the parameter. For these reasons, allow selection of the egress interface via MAC address when configuring netconsole using the module parameter. Update the netconsole documentation with an example of the new syntax. Selection of egress interface by MAC address via configfs is far less interesting (since when this interface can be used, one should be able to easily convert between MAC address and interface name), so it is left unimplemented. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Breno Leitao <leitao@debian.org> Tested-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250312-netconsole-v6-2-3437933e79b8@purestorage.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18Merge tag 'batadv-next-pullrequest-20250313' of ↵Paolo Abeni
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - drop batadv_priv_debug_log struct, by Sven Eckelmann - adopt netdev_hold() / netdev_put(), by Eric Dumazet - add support for jumbo frames, by Sven Eckelmann - use consistent name for mesh interface, by Sven Eckelmann - cleanup B.A.T.M.A.N. IV OGM aggregation handling, by Sven Eckelmann (4 patches) - add missing newlines for log macros, by Sven Eckelmann * tag 'batadv-next-pullrequest-20250313' of git://git.open-mesh.org/linux-merge: batman-adv: add missing newlines for log macros batman-adv: Limit aggregation size to outgoing MTU batman-adv: Use actual packet count for aggregated packets batman-adv: Switch to bitmap helper for aggregation handling batman-adv: Limit number of aggregated packets directly batman-adv: Use consistent name for mesh interface batman-adv: Add support for jumbo frames batman-adv: adopt netdev_hold() / netdev_put() batman-adv: Drop batadv_priv_debug_log struct batman-adv: Start new development cycle ==================== Link: https://patch.msgid.link/20250313164519.72808-1-sw@simonwunderlich.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-18bnxt_en: Add devlink support for ENABLE_ROCE nvm parameterPavan Chebbi
Add set/show support for the ENABLE_ROCE NVM parameter to enable/disable RoCE for a PF. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Co-developed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250310183129.3154117-4-michael.chan@broadcom.com Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-12net: revert to lockless TC_SETUP_BLOCK and TC_SETUP_FTStanislav Fomichev
There is a couple of places from which we can arrive to ndo_setup_tc with TC_SETUP_BLOCK/TC_SETUP_FT: - netlink - netlink notifier - netdev notifier Locking netdev too deep in this call chain seems to be problematic (especially assuming some/all of the call_netdevice_notifiers NETDEV_UNREGISTER) might soon be running with the instance lock). Revert to lockless ndo_setup_tc for TC_SETUP_BLOCK/TC_SETUP_FT. NFT framework already takes care of most of the locking. Document the assumptions. ndo_setup_tc TC_SETUP_BLOCK nft_block_offload_cmd nft_chain_offload_cmd nft_flow_block_chain nft_flow_offload_chain nft_flow_rule_offload_abort nft_flow_rule_offload_commit nft_flow_rule_offload_commit nf_tables_commit nfnetlink_rcv_batch nfnetlink_rcv_skb_batch nfnetlink_rcv nft_offload_netdev_event NETDEV_UNREGISTER notifier ndo_setup_tc TC_SETUP_FT nf_flow_table_offload_cmd nf_flow_table_offload_setup nft_unregister_flowtable_hook nft_register_flowtable_net_hooks nft_flowtable_update nf_tables_newflowtable nfnetlink_rcv_batch (.call NFNL_CB_BATCH) nft_flowtable_update nf_tables_newflowtable nft_flowtable_event nf_tables_flowtable_event NETDEV_UNREGISTER notifier __nft_unregister_flowtable_net_hooks nft_unregister_flowtable_net_hooks nf_tables_commit nfnetlink_rcv_batch (.call NFNL_CB_BATCH) __nf_tables_abort nf_tables_abort nfnetlink_rcv_batch __nft_release_hook __nft_release_hooks nf_tables_pre_exit_net -> module unload nft_rcv_nl_event netlink_register_notifier (oh boy) nft_register_flowtable_net_hooks nft_flowtable_update nf_tables_newflowtable nf_tables_newflowtable Fixes: c4f0f30b424e ("net: hold netdev instance lock during nft ndo_setup_tc") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reported-by: syzbot+0afb4bcf91e5a1afdcad@syzkaller.appspotmail.com Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250308044726.1193222-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06docs: net: document new locking realityStanislav Fomichev
Also clarify ndo_get_stats (that read and write paths can run concurrently) and mention only RCU. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-14-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04netconsole: docs: document the task name featureBreno Leitao
Add documentation for the netconsole task name feature in Documentation/networking/netconsole.rst. This explains how to enable task name via configfs and demonstrates the output format. The documentation includes: - How to enable/disable the feature via taskname_enabled - The format of the task name in the output - An example showing the task name appearing in messages Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: rename netns_local to netns_immutableNicolas Dichtel
The name 'netns_local' is confusing. A following commit will export it via netlink, so let's use a more explicit name. Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc5). Conflicts: drivers/net/ethernet/cadence/macb_main.c fa52f15c745c ("net: cadence: macb: Synchronize stats calculations") 75696dd0fd72 ("net: cadence: macb: Convert to get_stats64") https://lore.kernel.org/20250224125848.68ee63e5@canb.auug.org.au Adjacent changes: drivers/net/ethernet/intel/ice/ice_sriov.c 79990cf5e7ad ("ice: Fix deinitializing VF in error path") a203163274a4 ("ice: simplify VF MSI-X managing") net/ipv4/tcp.c 18912c520674 ("tcp: devmem: don't write truncated dmabuf CMSGs to userspace") 297d389e9e5b ("net: prefix devmem specific helpers") net/mptcp/subflow.c 8668860b0ad3 ("mptcp: reset when MPTCP opts are dropped after join") c3349a22c200 ("mptcp: consolidate subflow cleanup") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26net: move aRFS rmap management and CPU affinity to coreAhmed Zaki
A common task for most drivers is to remember the user-set CPU affinity to its IRQs. On each netdev reset, the driver should re-assign the user's settings to the IRQs. Unify this task across all drivers by moving the CPU affinity to napi->config. However, to move the CPU affinity to core, we also need to move aRFS rmap management since aRFS uses its own IRQ notifiers. For the aRFS, add a new netdev flag "rx_cpu_rmap_auto". Drivers supporting aRFS should set the flag via netif_enable_cpu_rmap() and core will allocate and manage the aRFS rmaps. Freeing the rmap is also done by core when the netdev is freed. For better IRQ affinity management, move the IRQ rmap notifier inside the napi_struct and add new notify.notify and notify.release functions: netif_irq_cpu_rmap_notify() and netif_napi_affinity_release(). Now we have the aRFS rmap management in core, add CPU affinity mask to napi_config. To delegate the CPU affinity management to the core, drivers must: 1 - set the new netdev flag "irq_affinity_auto": netif_enable_irq_affinity(netdev) 2 - create the napi with persistent config: netif_napi_add_config() 3 - bind an IRQ to the napi instance: netif_napi_set_irq() the core will then make sure to use re-assign affinity to the napi's IRQ. The default IRQ mask is set to one cpu starting from the closest NUMA. Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://patch.msgid.link/20250224232228.990783-2-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26tcp: be less liberal in TSEcr received while in SYN_RECV stateEric Dumazet
Yong-Hao Zou mentioned that linux was not strict as other OS in 3WHS, for flows using TCP TS option (RFC 7323) As hinted by an old comment in tcp_check_req(), we can check the TSEcr value in the incoming packet corresponds to one of the SYNACK TSval values we have sent. In this patch, I record the oldest and most recent values that SYNACK packets have used. Send a challenge ACK if we receive a TSEcr outside of this range, and increase a new SNMP counter. nstat -az | grep TSEcrRejected TcpExtTSEcrRejected 0 0.0 Due to TCP fastopen implementation, do not apply yet these checks for fastopen flows. v2: No longer use req->num_timeout, but treq->snt_tsval_first to detect when first SYNACK is prepared. This means we make sure to not send an initial zero TSval. Make sure MPTCP and TCP selftests are passing. Change MIB name to TcpExtTSEcrRejected v1: https://lore.kernel.org/netdev/CADVnQykD8i4ArpSZaPKaoNxLJ2if2ts9m4As+=Jvdkrgx1qMHw@mail.gmail.com/T/ Reported-by: Yong-Hao Zou <yonghaoz1994@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250225171048.3105061-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26net: spider_net: Remove powerpc Cell driverMichael Ellerman
This driver can no longer be built since support for IBM Cell Blades was removed, in particular PPC_IBM_CELL_BLADE. Remove the driver and the documentation. Remove the MAINTAINERS entry, and add Ishizaki and Geoff to CREDITS. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20241218105523.416573-24-mpe@ellerman.id.au
2025-02-25ethtool: Symmetric OR-XOR RSS hashGal Pressman
Add an additional type of symmetric RSS hash type: OR-XOR. The "Symmetric-OR-XOR" algorithm transforms the input as follows: (SRC_IP | DST_IP, SRC_IP ^ DST_IP, SRC_PORT | DST_PORT, SRC_PORT ^ DST_PORT) Change 'cap_rss_sym_xor_supported' to 'supported_input_xfrm', a bitmap of supported RXH_XFRM_* types. Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/20250224174416.499070-2-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-22batman-adv: Use consistent name for mesh interfaceSven Eckelmann
The way how the virtual interface is called inside the batman-adv source code is not consistent. The genl headers call it meshif and the rest of the code calls is (mostly) softif. The genl definitions cannot be touched because they are part of the UAPI. But the rest of the batman-adv code can be touched to have a consistent name again. The bulk of the renaming was done using sed -i -e 's/soft\(-\|\_\| \|\)i\([nf]\)/mesh\1i\2/g' \ -e 's/SOFT\(-\|\_\| \|\)I\([NF]\)/MESH\1I\2/g' and then it was adjusted slightly when proofreading the changes. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2025-02-21Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Martin KaFai Lau says: ==================== pull-request: bpf-next 2025-02-20 We've added 19 non-merge commits during the last 8 day(s) which contain a total of 35 files changed, 1126 insertions(+), 53 deletions(-). The main changes are: 1) Add TCP_RTO_MAX_MS support to bpf_set/getsockopt, from Jason Xing 2) Add network TX timestamping support to BPF sock_ops, from Jason Xing 3) Add TX metadata Launch Time support, from Song Yoong Siang * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: igc: Add launch time support to XDP ZC igc: Refactor empty frame insertion for launch time support net: stmmac: Add launch time support to XDP ZC selftests/bpf: Add launch time request to xdp_hw_metadata xsk: Add launch time hardware offload support to XDP Tx metadata selftests/bpf: Add simple bpf tests in the tx path for timestamping feature bpf: Support selective sampling for bpf timestamping bpf: Add BPF_SOCK_OPS_TSTAMP_SENDMSG_CB callback bpf: Add BPF_SOCK_OPS_TSTAMP_ACK_CB callback bpf: Add BPF_SOCK_OPS_TSTAMP_SND_HW_CB callback bpf: Add BPF_SOCK_OPS_TSTAMP_SND_SW_CB callback bpf: Add BPF_SOCK_OPS_TSTAMP_SCHED_CB callback net-timestamp: Prepare for isolating two modes of SO_TIMESTAMPING bpf: Disable unsafe helpers in TX timestamping callbacks bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback bpf: Prepare the sock_ops ctx and call bpf prog for TX timestamping bpf: Add networking timestamping support to bpf_get/setsockopt() selftests/bpf: Add rto max for bpf_setsockopt test bpf: Support TCP_RTO_MAX_MS for bpf_setsockopt ==================== Link: https://patch.msgid.link/20250221022104.386462-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21xfrm: provide common xdo_dev_offload_ok callback implementationLeon Romanovsky
Almost all drivers except bond and nsim had same check if device can perform XFRM offload on that specific packet. The check was that packet doesn't have IPv4 options and IPv6 extensions. In NIC drivers, the IPv4 HELEN comparison was slightly different, but the intent was to check for the same conditions. So let's chose more strict variant as a common base. Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2025-02-20Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull BPF fixes from Daniel Borkmann: - Fix a soft-lockup in BPF arena_map_free on 64k page size kernels (Alan Maguire) - Fix a missing allocation failure check in BPF verifier's acquire_lock_state (Kumar Kartikeya Dwivedi) - Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb to the raw_tp_null_args set (Kuniyuki Iwashima) - Fix a deadlock when freeing BPF cgroup storage (Abel Wu) - Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex (Andrii Nakryiko) - Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is accessing skb data not containing an Ethernet header (Shigeru Yoshida) - Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai) - Several BPF sockmap fixes to address incorrect TCP copied_seq calculations, which prevented correct data reads from recv(2) in user space (Jiayuan Chen) - Two fixes for BPF map lookup nullness elision (Daniel Xu) - Fix a NULL-pointer dereference from vmlinux BTF lookup in bpf_sk_storage_tracing_allowed (Jared Kangas) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests: bpf: test batch lookup on array of maps with holes bpf: skip non exist keys in generic_map_lookup_batch bpf: Handle allocation failure in acquire_lock_state bpf: verifier: Disambiguate get_constant_map_key() errors bpf: selftests: Test constant key extraction on irrelevant maps bpf: verifier: Do not extract constant map keys for irrelevant maps bpf: Fix softlockup in arena_map_free on 64k page kernel net: Add rx_skb of kfree_skb to raw_tp_null_args[]. bpf: Fix deadlock when freeing cgroup storage selftests/bpf: Add strparser test for bpf selftests/bpf: Fix invalid flag of recv() bpf: Disable non stream socket for strparser bpf: Fix wrong copied_seq calculation strparser: Add read_sock callback bpf: avoid holding freeze_mutex during mmap operation bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic selftests/bpf: Adjust data size to have ETH_HLEN bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type() bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed
2025-02-20xsk: Add launch time hardware offload support to XDP Tx metadataSong Yoong Siang
Extend the XDP Tx metadata framework so that user can requests launch time hardware offload, where the Ethernet device will schedule the packet for transmission at a pre-determined time called launch time. The value of launch time is communicated from user space to Ethernet driver via launch_time field of struct xsk_tx_metadata. Suggested-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250216093430.957880-2-yoong.siang.song@intel.com
2025-02-19can: j1939: Extend stack documentation with buffer size behaviorOleksij Rempel
Extend the J1939 stack documentation to include information about how buffer sizes influence stack behavior, detailing handling of simple sessions, TP, and ETP transfers. Additionally, describe various setsockopt(2) options, including their usage and potential error values that can be returned by the stack. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20241013181715.3488980-1-o.rempel@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-02-18docs: networking: Allow creating cross-references statistics ABIMauro Carvalho Chehab
Now that Documentation/ABI is processed by automarkup, let it generate cross-references for the corresponding ABI file. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/a34ab9bef8f4e6b89dcb15098557fd3a7a9aa353.1739254867.git.mchehab+huawei@kernel.org
2025-02-17net: add documentation for io_uring zcrxDavid Wei
Add documentation for io_uring zero copy Rx that explains requirements and the user API. Signed-off-by: David Wei <dw@davidwei.uk> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20250215000947.789731-11-dw@davidwei.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-14documentation: networking: Add NAPI configJoe Damato
Document the existence of persistent per-NAPI configuration space and the API that drivers can opt into. Update stale documentation which suggested that NAPI IDs cannot be queried from userspace. Signed-off-by: Joe Damato <jdamato@fastly.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://patch.msgid.link/20250213191535.38792-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-13Documentation: dpaa2 ethernet switch driver: Fix spellingRitvik Gupta
Corrected spelling mistake Signed-off-by: Ritvik Gupta <ritvikfoss@gmail.com> Link: https://patch.msgid.link/20250212021311.13257-1-ritvikfoss@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-12net/mlx5: Expose ICM consumption per functionAkiva Goldberger
ICM is a portion of the host's memory assigned to a function by the OS through requests made by the NIC's firmware. PF ICM consumption can be accessed directly, while VF/SF ICM consumption can be accessed through their representors in switchdev mode. The value is exposed to the user in granularity of 4KB through the vnic health reporter as follows: $ devlink health diagnose pci/0000:08:00.0 reporter vnic vNIC env counters: total_error_queues: 0 send_queue_priority_update_flow: 0 comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0 invalid_command: 0 quota_exceeded_command: 0 nic_receive_steering_discard: 0 icm_consumption: 1032 Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250209101716.112774-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11sfc: document devlink flash supportEdward Cree
Update the information in sfc's devlink documentation including support for firmware update with devlink flash. Also update the help text for CONFIG_SFC_MTD, as it is no longer strictly required for firmware updates. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://patch.msgid.link/3476b0ef04a0944f03e0b771ec8ed1a9c70db4dc.1739186253.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-11tcp: add tcp_rto_max_ms sysctlEric Dumazet
Previous patch added a TCP_RTO_MAX_MS socket option to tune a TCP socket max RTO value. Many setups prefer to change a per netns sysctl. This patch adds /proc/sys/net/ipv4/tcp_rto_max_ms Its initial value is 120000 (120 seconds). Keep in mind that a decrease of tcp_rto_max_ms means shorter overall timeouts, unless tcp_retries2 sysctl is increased. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11tcp: add the ability to control max RTOEric Dumazet
Currently, TCP stack uses a constant (120 seconds) to limit the RTO value exponential growth. Some applications want to set a lower value. Add TCP_RTO_MAX_MS socket option to set a value (in ms) between 1 and 120 seconds. It is discouraged to change the socket rto max on a live socket, as it might lead to unexpected disconnects. Following patch is adding a netns sysctl to control the default value at socket creation time. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-10netconsole: docs: Add documentation for CPU number auto-populationBreno Leitao
Update the netconsole documentation to explain the new feature that allows automatic population of the CPU number. The key changes include introducing a new section titled "CPU number auto population in userdata", explaining how to enable the CPU number auto-population feature by writing to the "populate_cpu_nr" file in the netconsole configfs hierarchy. This documentation update ensures users are aware of the new CPU number auto-population functionality and how to leverage it for better demultiplexing and visibility of parallel netconsole output. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-08Documentation/networking: fix basic node example document ISO 15765-2Reyders Morales
In the current struct sockaddr_can tp is member of can_addr. tp is not member of struct sockaddr_can. Signed-off-by: Reyders Morales <reyders1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20250203224720.42530-1-reyders1@gmail.com Fixes: 67711e04254c ("Documentation: networking: document ISO 15765-2") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-02-05ice: devlink PF MSI-X max and min parameterMichal Swiatkowski
Use generic devlink PF MSI-X parameter to allow user to change MSI-X range. Add notes about this parameters into ice devlink documentation. Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-30doc: mptcp: sysctl: blackhole_timeout is per-netnsMatthieu Baerts (NGI0)
All other sysctl entries mention it, and it is a per-namespace sysctl. So mention it as well. Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole") Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-29strparser: Add read_sock callbackJiayuan Chen
Added a new read_sock handler, allowing users to customize read operations instead of relying on the native socket's read_sock. Signed-off-by: Jiayuan Chen <mrpre@163.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://patch.msgid.link/20250122100917.49845-2-mrpre@163.com
2025-01-27documentation: networking: fix spelling mistakesKhaled Elnaggar
Fix a couple of typos/spelling mistakes in the documentation. Signed-off-by: Khaled Elnaggar <khaledelnaggarlinux@gmail.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Link: https://patch.msgid.link/20250123082521.59997-1-khaledelnaggarlinux@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-18Merge tag 'batadv-next-pullrequest-20250117' of ↵Jakub Kicinski
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - Reorder includes for distributed-arp-table.c, by Sven Eckelmann - Fix translation table change handling, by Remi Pommarel (2 patches) - Map VID 0 to untagged TT VLAN, by Sven Eckelmann - Update MAINTAINERS/mailmap e-mail addresses, by the respective authors (4 patches) - netlink: reduce duplicate code by returning interfaces, by Linus Lüssing * tag 'batadv-next-pullrequest-20250117' of git://git.open-mesh.org/linux-merge: batman-adv: netlink: reduce duplicate code by returning interfaces MAINTAINERS: mailmap: add entries for Antonio Quartulli mailmap: add entries for Sven Eckelmann mailmap: add entries for Simon Wunderlich MAINTAINERS: update email address of Marek Linder batman-adv: Map VID 0 to untagged TT VLAN batman-adv: Don't keep redundant TT change events batman-adv: Remove atomic usage for tt.local_changes batman-adv: Reorder includes for distributed-arp-table.c batman-adv: Start new development cycle ==================== Link: https://patch.msgid.link/20250117123910.219278-1-sw@simonwunderlich.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-18mptcp: sysctl: add syn_retrans_before_tcp_fallbackMatthieu Baerts (NGI0)
The number of SYN + MPC retransmissions before falling back to TCP was fixed to 2. This is certainly a good default value, but having a fixed number can be a problem in some environments. The current behaviour means that if all packets are dropped, there will be: - The initial SYN + MPC - 2 retransmissions with MPC - The next ones will be without MPTCP. So typically ~3 seconds before falling back to TCP. In some networks where some temporally blackholes are unfortunately frequent, or when a client tries to initiate connections while the network is not ready yet, this can cause new connections not to have MPTCP connections. In such environments, it is now possible to increase the number of SYN retransmissions with MPTCP options to make sure MPTCP is used. Interesting values are: - 0: the first retransmission will be done without MPTCP options: quite aggressive, but also a higher risk of detecting false-positive MPTCP blackholes. - >= 128: all SYN retransmissions will keep the MPTCP options: back to the < 6.12 behaviour. The default behaviour is not changed here. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250117-net-next-mptcp-syn_retrans_before_tcp_fallback-v1-1-ab4b187099b0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-17net: ethtool: ts: add separate counter for unconfirmed one-step TX timestampsVladimir Oltean
For packets with two-step timestamp requests, the hardware timestamp comes back to the driver through a confirmation mechanism of sorts, which allows the driver to confidently bump the successful "pkts" counter. For one-step PTP, the NIC is supposed to autonomously insert its hardware TX timestamp in the packet headers while simultaneously transmitting it. There may be a confirmation that this was done successfully, or there may not. None of the current drivers which implement ethtool_ops :: get_ts_stats() also support HWTSTAMP_TX_ONESTEP_SYNC or HWTSTAMP_TX_ONESTEP_SYNC, so it is a bit unclear which model to follow. But there are NICs, such as DSA, where there is no transmit confirmation at all. Here, it would be wrong / misleading to increment the successful "pkts" counter, because one-step PTP packets can be dropped on TX just like any other packets. So introduce a special counter which signifies "yes, an attempt was made, but we don't know whether it also exited the port or not". I expect that for one-step PTP packets where a confirmation is available, the "pkts" counter would be bumped. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250116104628.123555-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: ethtool: add support for configuring hds-threshTaehee Yoo
The hds-thresh option configures the threshold value of the header-data-split. If a received packet size is larger than this threshold value, a packet will be split into header and payload. The header indicates TCP and UDP header, but it depends on driver spec. The bnxt_en driver supports HDS(Header-Data-Split) configuration at FW level, affecting TCP and UDP too. So, If hds-thresh is set, it affects UDP and TCP packets. Example: # ethtool -G <interface name> hds-thresh <value> # ethtool -G enp14s0f0np0 tcp-data-split on hds-thresh 256 # ethtool -g enp14s0f0np0 Ring parameters for enp14s0f0np0: Pre-set maximums: ... HDS thresh: 1023 Current hardware settings: ... TCP data split: on HDS thresh: 256 The default/min/max values are not defined in the ethtool so the drivers should define themself. The 0 value means that all TCP/UDP packets' header and payload will be split. Tested-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-3-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14Documentation: networking: update PHY error counter diagnostics in twisted ↵Oleksij Rempel
pair guide Replace generic instructions for monitoring error counters with a procedure using the unified PHY statistics interface (`--all-groups`). Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14net: ethtool: add support for structured PHY statisticsJakub Kicinski
Introduce a new way to report PHY statistics in a structured and standardized format using the netlink API. This new method does not replace the old driver-specific stats, which can still be accessed with `ethtool -S <eth name>`. The structured stats are available with `ethtool -S <eth name> --all-groups`. This new method makes it easier to diagnose problems by organizing stats in a consistent and documented way. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-13net/mlx5: fs, add HWS to steering mode optionsMoshe Shemesh
Add HW Steering mode to mlx5 devlink param of steering mode options. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250109160546.1733647-14-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10Merge tag 'ipsec-next-2025-01-09' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next-2025-01-09 1) Implement the AGGFRAG protocol and basic IP-TFS (RFC9347) functionality. From Christian Hopps. 2) Support ESN context update to hardware for TX. From Jianbo Liu. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-09tools: ynl: move python code to separate sub-directoryJan Stancek
Move python code to a separate directory so it can be packaged as a python module. Updates existing references in selftests and docs. Also rename ynl-gen-[c|rst] to ynl_gen_[c|rst], avoid dashes as these prevent easy imports for entrypoints. Signed-off-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/a4151bad0e6984e7164d395125ce87fd2e048bf1.1736343575.git.jstancek@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09netdev: define NETDEV_INTERNALJakub Kicinski
Linus suggested during one of past maintainer summits (in context of a DMA_BUF discussion) that symbol namespaces can be used to prevent unwelcome but in-tree code from using all exported functions. Create a namespace for netdev. Export netdev_rx_queue_restart(), drivers may want to use it since it gives them a simple and safe way to restart a queue to apply config changes. But it's both too low level and too actively developed to be used outside netdev. Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-07i40e: add ability to reset VF for Tx and Rx MDD eventsAleksandr Loktionov
Implement "mdd-auto-reset-vf" priv-flag to handle Tx and Rx MDD events for VFs. This flag is also used in other network adapters like ICE. Usage: - "on" - The problematic VF will be automatically reset if a malformed descriptor is detected. - "off" - The problematic VF will be disabled. In cases where a VF sends malformed packets classified as malicious, it can cause the Tx queue to freeze, rendering it unusable for several minutes. When an MDD event occurs, this new implementation allows for a graceful VF reset to quickly restore operational state. Currently, VF queues are disabled if an MDD event occurs. This patch adds the ability to reset the VF if a Tx or Rx MDD event occurs. It also includes MDD event logging throttling to avoid dmesg pollution and unifies the format of Tx and Rx MDD messages. Note: Standard message rate limiting functions like dev_info_ratelimited() do not meet our requirements. Custom rate limiting is implemented, please see the code for details. Co-developed-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Co-developed-by: Padraig J Connolly <padraig.j.connolly@intel.com> Signed-off-by: Padraig J Connolly <padraig.j.connolly@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250106221929.956999-13-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07xfrm: Support ESN context update to hardware for TXJianbo Liu
Previously xfrm_dev_state_advance_esn() was added for RX only. But it's possible that ESN context also need to be synced to hardware for TX, so call it for outbound in this patch. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>