diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2025-01-22 08:28:57 -0800 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2025-01-22 08:28:57 -0800 |
| commit | 0ad9617c78acbc71373fb341a6f75d4012b01d69 (patch) | |
| tree | 602d7c9ec86d9a4891a96a2996af6e4368a647eb /drivers/net/netdevsim/netdev.c | |
| parent | 5f537664e705b0bf8b7e329861f20128534f6a83 (diff) | |
| parent | cf33d96f50903214226b379b3f10d1f262dae018 (diff) | |
Merge tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"This is slightly smaller than usual, with the most interesting work
being still around RTNL scope reduction.
Core:
- More core refactoring to reduce the RTNL lock contention, including
preparatory work for the per-network namespace RTNL lock, replacing
RTNL lock with a per device-one to protect NAPI-related net device
data and moving synchronize_net() calls outside such lock.
- Extend drop reasons usage, adding net scheduler, AF_UNIX, bridge
and more specific TCP coverage.
- Reduce network namespace tear-down time by removing per-subsystems
synchronize_net() in tipc and sched.
- Add flow label selector support for fib rules, allowing traffic
redirection based on such header field.
Netfilter:
- Do not remove netdev basechain when last device is gone, allowing
netdev basechains without devices.
- Revisit the flowtable teardown strategy, dealing better with fin,
reset and re-open events.
- Scale-up IP-vs connection dumping by avoiding linear search on each
restart.
Protocols:
- A significant XDP socket refactor, consolidating and optimizing
several helpers into the core
- Better scaling of ICMP rate-limiting, by removing false-sharing in
inet peers handling.
- Introduces netlink notifications for multicast IPv4 and IPv6
address changes.
- Add ipsec support for IP-TFS/AggFrag encapsulation, allowing
aggregation and fragmentation of the inner IP.
- Add sysctl to configure TIME-WAIT reuse delay for TCP sockets, to
avoid local port exhaustion issues when the average connection
lifetime is very short.
- Support updating keys (re-keying) for connections using kernel TLS
(for TLS 1.3 only).
- Support ipv4-mapped ipv6 address clients in smc-r v2.
- Add support for jumbo data packet transmission in RxRPC sockets,
gluing multiple data packets in a single UDP packet.
- Support RxRPC RACK-TLP to manage packet loss and retransmission in
conjunction with the congestion control algorithm.
Driver API:
- Introduce a unified and structured interface for reporting PHY
statistics, exposing consistent data across different H/W via
ethtool.
- Make timestamping selectable, allow the user to select the desired
hwtstamp provider (PHY or MAC) administratively.
- Add support for configuring a header-data-split threshold (HDS)
value via ethtool, to deal with partial or buggy H/W
implementation.
- Consolidate DSA drivers Energy Efficiency Ethernet support.
- Add EEE management to phylink, making use of the phylib
implementation.
- Add phylib support for in-band capabilities negotiation.
- Simplify how phylib-enabled mac drivers expose the supported
interfaces.
Tests and tooling:
- Make the YNL tool package-friendly to make it easier to deploy it
separately from the kernel.
- Increase TCP selftest coverage importing several packetdrill
test-cases.
- Regenerate the ethtool uapi header from the YNL spec, to ease
maintenance and future development.
- Add YNL support for decoding the link types used in net self-tests,
allowing a single build to run both net and drivers/net.
Drivers:
- Ethernet high-speed NICs:
- nVidia/Mellanox (mlx5):
- add cross E-Switch QoS support
- add SW Steering support for ConnectX-8
- implement support for HW-Managed Flow Steering, improving the
rule deletion/insertion rate
- support for multi-host LAG
- Intel (ixgbe, ice, igb):
- ice: add support for devlink health events
- ixgbe: add initial support for E610 chipset variant
- igb: add support for AF_XDP zero-copy
- Meta:
- add support for basic RSS config
- allow changing the number of channels
- add hardware monitoring support
- Broadcom (bnxt):
- implement TCP data split and HDS threshold ethtool support,
enabling Device Memory TCP.
- Marvell Octeon:
- implement egress ipsec offload support for the cn10k family
- Hisilicon (HIBMC):
- implement unicast MAC filtering
- Ethernet NICs embedded and virtual:
- Convert UDP tunnel drivers to NETDEV_PCPU_STAT_DSTATS, avoiding
contented atomic operations for drop counters
- Freescale:
- quicc: phylink conversion
- enetc: support Tx and Rx checksum offload and improve TSO
performances
- MediaTek:
- airoha: introduce support for ETS and HTB Qdisc offload
- Microchip:
- lan78XX USB: preparation work for phylink conversion
- Synopsys (stmmac):
- support DWMAC IP on NXP Automotive SoCs S32G2xx/S32G3xx/S32R45
- refactor EEE support to leverage the new driver API
- optimize DMA and cache access to increase raw RX performances
by 40%
- TI:
- icssg-prueth: add multicast filtering support for VLAN
interface
- netkit:
- add ability to configure head/tailroom
- VXLAN:
- accepts packets with user-defined reserved bit
- Ethernet switches:
- Microchip:
- lan969x: add RGMII support
- lan969x: improve TX and RX performance using the FDMA engine
- nVidia/Mellanox:
- move Tx header handling to PCI driver, to ease XDP support
- Ethernet PHYs:
- Texas Instruments DP83822:
- add support for GPIO2 clock output
- Realtek:
- 8169: add support for RTL8125D rev.b
- rtl822x: add hwmon support for the temperature sensor
- Microchip:
- add support for RDS PTP hardware
- consolidate periodic output signal generation
- CAN:
- several DT-bindings to DT schema conversions
- tcan4x5x:
- add HW standby support
- support nWKRQ voltage selection
- kvaser:
- allowing Bus Error Reporting runtime configuration
- WiFi:
- the on-going Multi-Link Operation (MLO) effort continues,
affecting both the stack and in drivers
- mac80211/cfg80211:
- Emergency Preparedness Communication Services (EPCS) station
mode support
- support for adding and removing station links for MLO
- add support for WiFi 7/EHT mesh over 320 MHz channels
- report Tx power info for each link
- RealTek (rtw88):
- enable USB Rx aggregation and USB 3 to improve performance
- LED support
- RealTek (rtw89):
- refactor power save to support Multi-Link Operations
- add support for RTL8922AE-VS variant
- MediaTek (mt76):
- single wiphy multiband support (preparation for MLO)
- p2p device support
- add TP-Link TXE50UH USB adapter support
- Qualcomm (ath10k):
- support for the QCA6698AQ IP core
- Qualcomm (ath12k):
- enable MLO for QCN9274
- Bluetooth:
- Allow sysfs to trigger hdev reset, to allow recovering devices
not responsive from user-space
- MediaTek: add support for MT7922, MT7925, MT7921e devices
- Realtek: add support for RTL8851BE devices
- Qualcomm: add support for WCN785x devices
- ISO: allow BIG re-sync"
* tag 'net-next-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1386 commits)
net/rose: prevent integer overflows in rose_setsockopt()
net: phylink: fix regression when binding a PHY
net: ethernet: ti: am65-cpsw: streamline TX queue creation and cleanup
net: ethernet: ti: am65-cpsw: streamline RX queue creation and cleanup
net: ethernet: ti: am65-cpsw: ensure proper channel cleanup in error path
ipv6: Convert inet6_rtm_deladdr() to per-netns RTNL.
ipv6: Convert inet6_rtm_newaddr() to per-netns RTNL.
ipv6: Move lifetime validation to inet6_rtm_newaddr().
ipv6: Set cfg.ifa_flags before device lookup in inet6_rtm_newaddr().
ipv6: Pass dev to inet6_addr_add().
ipv6: Convert inet6_ioctl() to per-netns RTNL.
ipv6: Hold rtnl_net_lock() in addrconf_init() and addrconf_cleanup().
ipv6: Hold rtnl_net_lock() in addrconf_dad_work().
ipv6: Hold rtnl_net_lock() in addrconf_verify_work().
ipv6: Convert net.ipv6.conf.${DEV}.XXX sysctl to per-netns RTNL.
ipv6: Add __in6_dev_get_rtnl_net().
net: stmmac: Drop redundant skb_mark_for_recycle() for SKB frags
net: mii: Fix the Speed display when the network cable is not connected
sysctl net: Remove macro checks for CONFIG_SYSCTL
eth: bnxt: update header sizing defaults
...
Diffstat (limited to 'drivers/net/netdevsim/netdev.c')
| -rw-r--r-- | drivers/net/netdevsim/netdev.c | 268 |
1 files changed, 234 insertions, 34 deletions
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c index e068a9761c09..42f247cbdcee 100644 --- a/drivers/net/netdevsim/netdev.c +++ b/drivers/net/netdevsim/netdev.c @@ -15,11 +15,13 @@ #include <linux/debugfs.h> #include <linux/etherdevice.h> +#include <linux/ethtool_netlink.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/netdevice.h> #include <linux/slab.h> #include <net/netdev_queues.h> +#include <net/netdev_rx_queue.h> #include <net/page_pool/helpers.h> #include <net/netlink.h> #include <net/net_shaper.h> @@ -29,6 +31,8 @@ #include "netdevsim.h" +MODULE_IMPORT_NS("NETDEV_INTERNAL"); + #define NSIM_RING_SIZE 256 static int nsim_napi_rx(struct nsim_rq *rq, struct sk_buff *skb) @@ -54,6 +58,7 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev) struct net_device *peer_dev; unsigned int len = skb->len; struct netdevsim *peer_ns; + struct netdev_config *cfg; struct nsim_rq *rq; int rxq; @@ -69,7 +74,14 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev) rxq = skb_get_queue_mapping(skb); if (rxq >= peer_dev->num_rx_queues) rxq = rxq % peer_dev->num_rx_queues; - rq = &peer_ns->rq[rxq]; + rq = peer_ns->rq[rxq]; + + cfg = peer_dev->cfg; + if (skb_is_nonlinear(skb) && + (cfg->hds_config != ETHTOOL_TCP_DATA_SPLIT_ENABLED || + (cfg->hds_config == ETHTOOL_TCP_DATA_SPLIT_ENABLED && + cfg->hds_thresh > len))) + skb_linearize(skb); skb_tx_timestamp(skb); if (unlikely(nsim_forward_skb(peer_dev, skb, rq) == NET_RX_DROP)) @@ -359,25 +371,24 @@ static int nsim_poll(struct napi_struct *napi, int budget) return done; } -static int nsim_create_page_pool(struct nsim_rq *rq) +static int nsim_create_page_pool(struct page_pool **p, struct napi_struct *napi) { - struct page_pool_params p = { + struct page_pool_params params = { .order = 0, .pool_size = NSIM_RING_SIZE, .nid = NUMA_NO_NODE, - .dev = &rq->napi.dev->dev, - .napi = &rq->napi, + .dev = &napi->dev->dev, + .napi = napi, .dma_dir = DMA_BIDIRECTIONAL, - .netdev = rq->napi.dev, + .netdev = napi->dev, }; + struct page_pool *pool; - rq->page_pool = page_pool_create(&p); - if (IS_ERR(rq->page_pool)) { - int err = PTR_ERR(rq->page_pool); + pool = page_pool_create(¶ms); + if (IS_ERR(pool)) + return PTR_ERR(pool); - rq->page_pool = NULL; - return err; - } + *p = pool; return 0; } @@ -388,15 +399,15 @@ static int nsim_init_napi(struct netdevsim *ns) int err, i; for (i = 0; i < dev->num_rx_queues; i++) { - rq = &ns->rq[i]; + rq = ns->rq[i]; - netif_napi_add(dev, &rq->napi, nsim_poll); + netif_napi_add_config(dev, &rq->napi, nsim_poll, i); } for (i = 0; i < dev->num_rx_queues; i++) { - rq = &ns->rq[i]; + rq = ns->rq[i]; - err = nsim_create_page_pool(rq); + err = nsim_create_page_pool(&rq->page_pool, &rq->napi); if (err) goto err_pp_destroy; } @@ -405,12 +416,12 @@ static int nsim_init_napi(struct netdevsim *ns) err_pp_destroy: while (i--) { - page_pool_destroy(ns->rq[i].page_pool); - ns->rq[i].page_pool = NULL; + page_pool_destroy(ns->rq[i]->page_pool); + ns->rq[i]->page_pool = NULL; } for (i = 0; i < dev->num_rx_queues; i++) - __netif_napi_del(&ns->rq[i].napi); + __netif_napi_del(&ns->rq[i]->napi); return err; } @@ -421,7 +432,7 @@ static void nsim_enable_napi(struct netdevsim *ns) int i; for (i = 0; i < dev->num_rx_queues; i++) { - struct nsim_rq *rq = &ns->rq[i]; + struct nsim_rq *rq = ns->rq[i]; netif_queue_set_napi(dev, i, NETDEV_QUEUE_TYPE_RX, &rq->napi); napi_enable(&rq->napi); @@ -448,7 +459,7 @@ static void nsim_del_napi(struct netdevsim *ns) int i; for (i = 0; i < dev->num_rx_queues; i++) { - struct nsim_rq *rq = &ns->rq[i]; + struct nsim_rq *rq = ns->rq[i]; napi_disable(&rq->napi); __netif_napi_del(&rq->napi); @@ -456,8 +467,8 @@ static void nsim_del_napi(struct netdevsim *ns) synchronize_net(); for (i = 0; i < dev->num_rx_queues; i++) { - page_pool_destroy(ns->rq[i].page_pool); - ns->rq[i].page_pool = NULL; + page_pool_destroy(ns->rq[i]->page_pool); + ns->rq[i]->page_pool = NULL; } } @@ -595,6 +606,182 @@ static const struct netdev_stat_ops nsim_stat_ops = { .get_base_stats = nsim_get_base_stats, }; +static struct nsim_rq *nsim_queue_alloc(void) +{ + struct nsim_rq *rq; + + rq = kzalloc(sizeof(*rq), GFP_KERNEL_ACCOUNT); + if (!rq) + return NULL; + + skb_queue_head_init(&rq->skb_queue); + return rq; +} + +static void nsim_queue_free(struct nsim_rq *rq) +{ + skb_queue_purge_reason(&rq->skb_queue, SKB_DROP_REASON_QUEUE_PURGE); + kfree(rq); +} + +/* Queue reset mode is controlled by ns->rq_reset_mode. + * - normal - new NAPI new pool (old NAPI enabled when new added) + * - mode 1 - allocate new pool (NAPI is only disabled / enabled) + * - mode 2 - new NAPI new pool (old NAPI removed before new added) + * - mode 3 - new NAPI new pool (old NAPI disabled when new added) + */ +struct nsim_queue_mem { + struct nsim_rq *rq; + struct page_pool *pp; +}; + +static int +nsim_queue_mem_alloc(struct net_device *dev, void *per_queue_mem, int idx) +{ + struct nsim_queue_mem *qmem = per_queue_mem; + struct netdevsim *ns = netdev_priv(dev); + int err; + + if (ns->rq_reset_mode > 3) + return -EINVAL; + + if (ns->rq_reset_mode == 1) + return nsim_create_page_pool(&qmem->pp, &ns->rq[idx]->napi); + + qmem->rq = nsim_queue_alloc(); + if (!qmem->rq) + return -ENOMEM; + + err = nsim_create_page_pool(&qmem->rq->page_pool, &qmem->rq->napi); + if (err) + goto err_free; + + if (!ns->rq_reset_mode) + netif_napi_add_config(dev, &qmem->rq->napi, nsim_poll, idx); + + return 0; + +err_free: + nsim_queue_free(qmem->rq); + return err; +} + +static void nsim_queue_mem_free(struct net_device *dev, void *per_queue_mem) +{ + struct nsim_queue_mem *qmem = per_queue_mem; + struct netdevsim *ns = netdev_priv(dev); + + page_pool_destroy(qmem->pp); + if (qmem->rq) { + if (!ns->rq_reset_mode) + netif_napi_del(&qmem->rq->napi); + page_pool_destroy(qmem->rq->page_pool); + nsim_queue_free(qmem->rq); + } +} + +static int +nsim_queue_start(struct net_device *dev, void *per_queue_mem, int idx) +{ + struct nsim_queue_mem *qmem = per_queue_mem; + struct netdevsim *ns = netdev_priv(dev); + + if (ns->rq_reset_mode == 1) { + ns->rq[idx]->page_pool = qmem->pp; + napi_enable(&ns->rq[idx]->napi); + return 0; + } + + /* netif_napi_add()/_del() should normally be called from alloc/free, + * here we want to test various call orders. + */ + if (ns->rq_reset_mode == 2) { + netif_napi_del(&ns->rq[idx]->napi); + netif_napi_add_config(dev, &qmem->rq->napi, nsim_poll, idx); + } else if (ns->rq_reset_mode == 3) { + netif_napi_add_config(dev, &qmem->rq->napi, nsim_poll, idx); + netif_napi_del(&ns->rq[idx]->napi); + } + + ns->rq[idx] = qmem->rq; + napi_enable(&ns->rq[idx]->napi); + + return 0; +} + +static int nsim_queue_stop(struct net_device *dev, void *per_queue_mem, int idx) +{ + struct nsim_queue_mem *qmem = per_queue_mem; + struct netdevsim *ns = netdev_priv(dev); + + napi_disable(&ns->rq[idx]->napi); + + if (ns->rq_reset_mode == 1) { + qmem->pp = ns->rq[idx]->page_pool; + page_pool_disable_direct_recycling(qmem->pp); + } else { + qmem->rq = ns->rq[idx]; + } + + return 0; +} + +static const struct netdev_queue_mgmt_ops nsim_queue_mgmt_ops = { + .ndo_queue_mem_size = sizeof(struct nsim_queue_mem), + .ndo_queue_mem_alloc = nsim_queue_mem_alloc, + .ndo_queue_mem_free = nsim_queue_mem_free, + .ndo_queue_start = nsim_queue_start, + .ndo_queue_stop = nsim_queue_stop, +}; + +static ssize_t +nsim_qreset_write(struct file *file, const char __user *data, + size_t count, loff_t *ppos) +{ + struct netdevsim *ns = file->private_data; + unsigned int queue, mode; + char buf[32]; + ssize_t ret; + + if (count >= sizeof(buf)) + return -EINVAL; + if (copy_from_user(buf, data, count)) + return -EFAULT; + buf[count] = '\0'; + + ret = sscanf(buf, "%u %u", &queue, &mode); + if (ret != 2) + return -EINVAL; + + rtnl_lock(); + if (!netif_running(ns->netdev)) { + ret = -ENETDOWN; + goto exit_unlock; + } + + if (queue >= ns->netdev->real_num_rx_queues) { + ret = -EINVAL; + goto exit_unlock; + } + + ns->rq_reset_mode = mode; + ret = netdev_rx_queue_restart(ns->netdev, queue); + ns->rq_reset_mode = 0; + if (ret) + goto exit_unlock; + + ret = count; +exit_unlock: + rtnl_unlock(); + return ret; +} + +static const struct file_operations nsim_qreset_fops = { + .open = simple_open, + .write = nsim_qreset_write, + .owner = THIS_MODULE, +}; + static ssize_t nsim_pp_hold_read(struct file *file, char __user *data, size_t count, loff_t *ppos) @@ -628,7 +815,7 @@ nsim_pp_hold_write(struct file *file, const char __user *data, if (!netif_running(ns->netdev) && val) { ret = -ENETDOWN; } else if (val) { - ns->page = page_pool_dev_alloc_pages(ns->rq[0].page_pool); + ns->page = page_pool_dev_alloc_pages(ns->rq[0]->page_pool); if (!ns->page) ret = -ENOMEM; } else { @@ -677,27 +864,35 @@ static int nsim_queue_init(struct netdevsim *ns) struct net_device *dev = ns->netdev; int i; - ns->rq = kvcalloc(dev->num_rx_queues, sizeof(*ns->rq), - GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); + ns->rq = kcalloc(dev->num_rx_queues, sizeof(*ns->rq), + GFP_KERNEL_ACCOUNT); if (!ns->rq) return -ENOMEM; - for (i = 0; i < dev->num_rx_queues; i++) - skb_queue_head_init(&ns->rq[i].skb_queue); + for (i = 0; i < dev->num_rx_queues; i++) { + ns->rq[i] = nsim_queue_alloc(); + if (!ns->rq[i]) + goto err_free_prev; + } return 0; + +err_free_prev: + while (i--) + kfree(ns->rq[i]); + kfree(ns->rq); + return -ENOMEM; } -static void nsim_queue_free(struct netdevsim *ns) +static void nsim_queue_uninit(struct netdevsim *ns) { struct net_device *dev = ns->netdev; int i; for (i = 0; i < dev->num_rx_queues; i++) - skb_queue_purge_reason(&ns->rq[i].skb_queue, - SKB_DROP_REASON_QUEUE_PURGE); + nsim_queue_free(ns->rq[i]); - kvfree(ns->rq); + kfree(ns->rq); ns->rq = NULL; } @@ -713,6 +908,7 @@ static int nsim_init_netdevsim(struct netdevsim *ns) ns->phc = phc; ns->netdev->netdev_ops = &nsim_netdev_ops; ns->netdev->stat_ops = &nsim_stat_ops; + ns->netdev->queue_mgmt_ops = &nsim_queue_mgmt_ops; err = nsim_udp_tunnels_info_create(ns->nsim_dev, ns->netdev); if (err) @@ -741,7 +937,7 @@ err_ipsec_teardown: nsim_macsec_teardown(ns); nsim_bpf_uninit(ns); err_rq_destroy: - nsim_queue_free(ns); + nsim_queue_uninit(ns); err_utn_destroy: rtnl_unlock(); nsim_udp_tunnels_info_destroy(ns->netdev); @@ -798,6 +994,9 @@ nsim_create(struct nsim_dev *nsim_dev, struct nsim_dev_port *nsim_dev_port) ns->pp_dfs = debugfs_create_file("pp_hold", 0600, nsim_dev_port->ddir, ns, &nsim_pp_hold_fops); + ns->qr_dfs = debugfs_create_file("queue_reset", 0200, + nsim_dev_port->ddir, ns, + &nsim_qreset_fops); return ns; @@ -811,6 +1010,7 @@ void nsim_destroy(struct netdevsim *ns) struct net_device *dev = ns->netdev; struct netdevsim *peer; + debugfs_remove(ns->qr_dfs); debugfs_remove(ns->pp_dfs); rtnl_lock(); @@ -823,7 +1023,7 @@ void nsim_destroy(struct netdevsim *ns) nsim_macsec_teardown(ns); nsim_ipsec_teardown(ns); nsim_bpf_uninit(ns); - nsim_queue_free(ns); + nsim_queue_uninit(ns); } rtnl_unlock(); if (nsim_dev_port_is_pf(ns->nsim_dev_port)) |
