diff options
Diffstat (limited to 'Documentation/networking')
| -rw-r--r-- | Documentation/networking/device_drivers/ethernet/index.rst | 1 | ||||
| -rw-r--r-- | Documentation/networking/device_drivers/ethernet/neterion/s2io.rst | 196 | ||||
| -rw-r--r-- | Documentation/networking/index.rst | 1 | ||||
| -rw-r--r-- | Documentation/networking/iou-zcrx.rst | 20 | ||||
| -rw-r--r-- | Documentation/networking/ip-sysctl.rst | 4 | ||||
| -rw-r--r-- | Documentation/networking/net_cachelines/tcp_sock.rst | 1 | ||||
| -rw-r--r-- | Documentation/networking/netdevices.rst | 4 | ||||
| -rw-r--r-- | Documentation/networking/phy-port.rst | 111 | ||||
| -rw-r--r-- | Documentation/networking/phy.rst | 22 | ||||
| -rw-r--r-- | Documentation/networking/scaling.rst | 12 | ||||
| -rw-r--r-- | Documentation/networking/timestamping.rst | 7 | ||||
| -rw-r--r-- | Documentation/networking/tls-offload.rst | 30 |
12 files changed, 180 insertions, 229 deletions
diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst index 142ac0bf781b..5f3f06111911 100644 --- a/Documentation/networking/device_drivers/ethernet/index.rst +++ b/Documentation/networking/device_drivers/ethernet/index.rst @@ -48,7 +48,6 @@ Contents: meta/fbnic microsoft/netvsc mucse/rnpgbe - neterion/s2io netronome/nfp pensando/ionic pensando/ionic_rdma diff --git a/Documentation/networking/device_drivers/ethernet/neterion/s2io.rst b/Documentation/networking/device_drivers/ethernet/neterion/s2io.rst deleted file mode 100644 index d731b5a98561..000000000000 --- a/Documentation/networking/device_drivers/ethernet/neterion/s2io.rst +++ /dev/null @@ -1,196 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -========================================================= -Neterion's (Formerly S2io) Xframe I/II PCI-X 10GbE driver -========================================================= - -Release notes for Neterion's (Formerly S2io) Xframe I/II PCI-X 10GbE driver. - -.. Contents - - 1. Introduction - - 2. Identifying the adapter/interface - - 3. Features supported - - 4. Command line parameters - - 5. Performance suggestions - - 6. Available Downloads - - -1. Introduction -=============== -This Linux driver supports Neterion's Xframe I PCI-X 1.0 and -Xframe II PCI-X 2.0 adapters. It supports several features -such as jumbo frames, MSI/MSI-X, checksum offloads, TSO, UFO and so on. -See below for complete list of features. - -All features are supported for both IPv4 and IPv6. - -2. Identifying the adapter/interface -==================================== - -a. Insert the adapter(s) in your system. -b. Build and load driver:: - - # insmod s2io.ko - -c. View log messages:: - - # dmesg | tail -40 - -You will see messages similar to:: - - eth3: Neterion Xframe I 10GbE adapter (rev 3), Version 2.0.9.1, Intr type INTA - eth4: Neterion Xframe II 10GbE adapter (rev 2), Version 2.0.9.1, Intr type INTA - eth4: Device is on 64 bit 133MHz PCIX(M1) bus - -The above messages identify the adapter type(Xframe I/II), adapter revision, -driver version, interface name(eth3, eth4), Interrupt type(INTA, MSI, MSI-X). -In case of Xframe II, the PCI/PCI-X bus width and frequency are displayed -as well. - -To associate an interface with a physical adapter use "ethtool -p <ethX>". -The corresponding adapter's LED will blink multiple times. - -3. Features supported -===================== -a. Jumbo frames. Xframe I/II supports MTU up to 9600 bytes, - modifiable using ip command. - -b. Offloads. Supports checksum offload(TCP/UDP/IP) on transmit - and receive, TSO. - -c. Multi-buffer receive mode. Scattering of packet across multiple - buffers. Currently driver supports 2-buffer mode which yields - significant performance improvement on certain platforms(SGI Altix, - IBM xSeries). - -d. MSI/MSI-X. Can be enabled on platforms which support this feature - resulting in noticeable performance improvement (up to 7% on certain - platforms). - -e. Statistics. Comprehensive MAC-level and software statistics displayed - using "ethtool -S" option. - -f. Multi-FIFO/Ring. Supports up to 8 transmit queues and receive rings, - with multiple steering options. - -4. Command line parameters -========================== - -a. tx_fifo_num - Number of transmit queues - -Valid range: 1-8 - -Default: 1 - -b. rx_ring_num - Number of receive rings - -Valid range: 1-8 - -Default: 1 - -c. tx_fifo_len - Size of each transmit queue - -Valid range: Total length of all queues should not exceed 8192 - -Default: 4096 - -d. rx_ring_sz - Size of each receive ring(in 4K blocks) - -Valid range: Limited by memory on system - -Default: 30 - -e. intr_type - Specifies interrupt type. Possible values 0(INTA), 2(MSI-X) - -Valid values: 0, 2 - -Default: 2 - -5. Performance suggestions -========================== - -General: - -a. Set MTU to maximum(9000 for switch setup, 9600 in back-to-back configuration) -b. Set TCP windows size to optimal value. - -For instance, for MTU=1500 a value of 210K has been observed to result in -good performance:: - - # sysctl -w net.ipv4.tcp_rmem="210000 210000 210000" - # sysctl -w net.ipv4.tcp_wmem="210000 210000 210000" - -For MTU=9000, TCP window size of 10 MB is recommended:: - - # sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000" - # sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000" - -Transmit performance: - -a. By default, the driver respects BIOS settings for PCI bus parameters. - However, you may want to experiment with PCI bus parameters - max-split-transactions(MOST) and MMRBC (use setpci command). - - A MOST value of 2 has been found optimal for Opterons and 3 for Itanium. - - It could be different for your hardware. - - Set MMRBC to 4K**. - - For example you can set - - For opteron:: - - #setpci -d 17d5:* 62=1d - - For Itanium:: - - #setpci -d 17d5:* 62=3d - - For detailed description of the PCI registers, please see Xframe User Guide. - -b. Ensure Transmit Checksum offload is enabled. Use ethtool to set/verify this - parameter. - -c. Turn on TSO(using "ethtool -K"):: - - # ethtool -K <ethX> tso on - -Receive performance: - -a. By default, the driver respects BIOS settings for PCI bus parameters. - However, you may want to set PCI latency timer to 248:: - - #setpci -d 17d5:* LATENCY_TIMER=f8 - - For detailed description of the PCI registers, please see Xframe User Guide. - -b. Use 2-buffer mode. This results in large performance boost on - certain platforms(eg. SGI Altix, IBM xSeries). - -c. Ensure Receive Checksum offload is enabled. Use "ethtool -K ethX" command to - set/verify this option. - -d. Enable NAPI feature(in kernel configuration Device Drivers ---> Network - device support ---> Ethernet (10000 Mbit) ---> S2IO 10Gbe Xframe NIC) to - bring down CPU utilization. - -.. note:: - - For AMD opteron platforms with 8131 chipset, MMRBC=1 and MOST=1 are - recommended as safe parameters. - -For more information, please review the AMD8131 errata at -http://vip.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/ -26310_AMD-8131_HyperTransport_PCI-X_Tunnel_Revision_Guide_rev_3_18.pdf - -6. Support -========== - -For further support please contact either your 10GbE Xframe NIC vendor (IBM, -HP, SGI etc.) diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 0f72de94b881..c2406bd8ae0b 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -96,6 +96,7 @@ Contents: packet_mmap phonet phy-link-topology + phy-port pktgen plip ppp_generic diff --git a/Documentation/networking/iou-zcrx.rst b/Documentation/networking/iou-zcrx.rst index 54a72e172bdc..7f3f4b2e6cf2 100644 --- a/Documentation/networking/iou-zcrx.rst +++ b/Documentation/networking/iou-zcrx.rst @@ -196,6 +196,26 @@ Return buffers back to the kernel to be used again:: rqe->len = cqe->res; IO_URING_WRITE_ONCE(*refill_ring.ktail, ++refill_ring.rq_tail); +Area chunking +------------- + +zcrx splits the memory area into fixed-length physically contiguous chunks. +This limits the maximum buffer size returned in a single io_uring CQE. Users +can provide a hint to the kernel to use larger chunks by setting the +``rx_buf_len`` field of ``struct io_uring_zcrx_ifq_reg`` to the desired length +during registration. If this field is set to zero, the kernel defaults to +the system page size. + +To use larger sizes, the memory area must be backed by physically contiguous +ranges whose sizes are multiples of ``rx_buf_len``. It also requires kernel +and hardware support. If registration fails, users are generally expected to +fall back to defaults by setting ``rx_buf_len`` to zero. + +Larger chunks don't give any additional guarantees about buffer sizes returned +in CQEs, and they can vary depending on many factors like traffic pattern, +hardware offload, etc. It doesn't require any application changes beyond zcrx +registration. + Testing ======= diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index bc9a01606daf..28c7e4f5ecf9 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -482,7 +482,9 @@ tcp_ecn_option - INTEGER 1 Send AccECN option sparingly according to the minimum option rules outlined in draft-ietf-tcpm-accurate-ecn. 2 Send AccECN option on every packet whenever it fits into TCP - option space. + option space except when AccECN fallback is triggered. + 3 Send AccECN option on every packet whenever it fits into TCP + option space even when AccECN fallback is triggered. = ============================================================ Default: 2 diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documentation/networking/net_cachelines/tcp_sock.rst index 26f32dbcf6ec..563daea10d6c 100644 --- a/Documentation/networking/net_cachelines/tcp_sock.rst +++ b/Documentation/networking/net_cachelines/tcp_sock.rst @@ -105,6 +105,7 @@ u32 received_ce read_mostly read_w u32[3] received_ecn_bytes read_mostly read_write u8:4 received_ce_pending read_mostly read_write u32[3] delivered_ecn_bytes read_write +u16 pkts_acked_ewma read_write u8:2 syn_ect_snt write_mostly read_write u8:2 syn_ect_rcv read_mostly read_write u8:2 accecn_minlen write_mostly read_write diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst index 7ebb6c36482d..35704d115312 100644 --- a/Documentation/networking/netdevices.rst +++ b/Documentation/networking/netdevices.rst @@ -80,7 +80,7 @@ unregister_netdev() closes the device and waits for all users to be done with it. The memory of struct net_device itself may still be referenced by sysfs but all operations on that device will fail. -free_netdev() can be called after unregister_netdev() returns on when +free_netdev() can be called after unregister_netdev() returns or when register_netdev() failed. Device management under RTNL @@ -333,7 +333,7 @@ In the future, there will be an option for individual drivers to opt out of using ``rtnl_lock`` and instead perform their control operations directly under the netdev instance lock. -Devices drivers are encouraged to rely on the instance lock where possible. +Device drivers are encouraged to rely on the instance lock where possible. For the (mostly software) drivers that need to interact with the core stack, there are two sets of interfaces: ``dev_xxx``/``netdev_xxx`` and ``netif_xxx`` diff --git a/Documentation/networking/phy-port.rst b/Documentation/networking/phy-port.rst new file mode 100644 index 000000000000..6e28d9094bce --- /dev/null +++ b/Documentation/networking/phy-port.rst @@ -0,0 +1,111 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. _phy_port: + +================= +Ethernet ports +================= + +This document is a basic description of the phy_port infrastructure, +introduced to represent physical interfaces of Ethernet devices. + +Without phy_port, we already have quite a lot of information about what the +media-facing interface of a NIC can do and looks like, through the +:c:type:`struct ethtool_link_ksettings <ethtool_link_ksettings>` attributes, +which includes : + + - What the NIC can do through the :c:member:`supported` field + - What the Link Partner advertises through :c:member:`lp_advertising` + - Which features we're advertising through :c:member:`advertising` + +We also have info about the number of pairs and the PORT type. These settings +are built by aggregating together information reported by various devices that +are sitting on the link : + + - The NIC itself, through the :c:member:`get_link_ksettings` callback + - Precise information from the MAC and PCS by using phylink in the MAC driver + - Information reported by the PHY device + - Information reported by an SFP module (which can itself include a PHY) + +This model however starts showing its limitations when we consider devices that +have more than one media interface. In such a case, only information about the +actively used interface is reported, and it's not possible to know what the +other interfaces can do. In fact, we have very little information about whether +or not there are any other media interfaces. + +The goal of the phy_port representation is to provide a way of representing a +physical interface of a NIC, regardless of what is driving the port (NIC through +a firmware, SFP module, Ethernet PHY). + +Multi-port interfaces examples +============================== + +Several cases of multi-interface NICs have been observed so far : + +Internal MII Mux:: + + +------------------+ + | SoC | + | +-----+ | +-----+ + | +-----+ | |-------------| PHY | + | | MAC |--| Mux | | +-----+ +-----+ + | +-----+ | |-----| SFP | + | +-----+ | +-----+ + +------------------+ + +Internal Mux with internal PHY:: + + +------------------------+ + | SoC | + | +-----+ +-----+ + | +-----+ | |-| PHY | + | | MAC |--| Mux | +-----+ +-----+ + | +-----+ | |-----------| SFP | + | +-----+ | +-----+ + +------------------------+ + +External Mux:: + + +---------+ + | SoC | +-----+ +-----+ + | | | |--| PHY | + | +-----+ | | | +-----+ + | | MAC |----| Mux | +-----+ + | +-----+ | | |--| PHY | + | | +-----+ +-----+ + | | | + | GPIO-------+ + +---------+ + +Double-port PHY:: + + +---------+ + | SoC | +-----+ + | | | |--- RJ45 + | +-----+ | | | + | | MAC |---| PHY | +-----+ + | +-----+ | | |---| SFP | + +---------+ +-----+ +-----+ + +phy_port aims at providing a path to support all the above topologies, by +representing the media interfaces in a way that's agnostic to what's driving +the interface. the struct phy_port object has its own set of callback ops, and +will eventually be able to report its own ksettings:: + + _____ +------+ + ( )-----| Port | + +-----+ ( ) +------+ + | MAC |--( ??? ) + +-----+ ( ) +------+ + (_____)-----| Port | + +------+ + +Next steps +========== + +As of writing this documentation, only ports controlled by PHY devices are +supported. The next steps will be to add the Netlink API to expose these +to userspace and add support for raw ports (controlled by some firmware, and directly +managed by the NIC driver). + +Another parallel task is the introduction of a MII muxing framework to allow the +control of non-PHY driver multi-port setups. diff --git a/Documentation/networking/phy.rst b/Documentation/networking/phy.rst index b0f2ef83735d..0170c9d4dc5e 100644 --- a/Documentation/networking/phy.rst +++ b/Documentation/networking/phy.rst @@ -524,33 +524,13 @@ When a match is found, the PHY layer will invoke the run function associated with the fixup. This function is passed a pointer to the phy_device of interest. It should therefore only operate on that PHY. -The platform code can either register the fixup using phy_register_fixup():: - - int phy_register_fixup(const char *phy_id, - u32 phy_uid, u32 phy_uid_mask, - int (*run)(struct phy_device *)); - -Or using one of the two stubs, phy_register_fixup_for_uid() and -phy_register_fixup_for_id():: +The platform code can register the fixup using one of:: int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask, int (*run)(struct phy_device *)); int phy_register_fixup_for_id(const char *phy_id, int (*run)(struct phy_device *)); -The stubs set one of the two matching criteria, and set the other one to -match anything. - -When phy_register_fixup() or \*_for_uid()/\*_for_id() is called at module load -time, the module needs to unregister the fixup and free allocated memory when -it's unloaded. - -Call one of following function before unloading module:: - - int phy_unregister_fixup(const char *phy_id, u32 phy_uid, u32 phy_uid_mask); - int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask); - int phy_register_fixup_for_id(const char *phy_id); - Standards ========= diff --git a/Documentation/networking/scaling.rst b/Documentation/networking/scaling.rst index 99b6a61e5e31..0023afa530ec 100644 --- a/Documentation/networking/scaling.rst +++ b/Documentation/networking/scaling.rst @@ -38,11 +38,15 @@ that is not the focus of these techniques. The filter used in RSS is typically a hash function over the network and/or transport layer headers-- for example, a 4-tuple hash over IP addresses and TCP ports of a packet. The most common hardware -implementation of RSS uses a 128-entry indirection table where each entry +implementation of RSS uses an indirection table where each entry stores a queue number. The receive queue for a packet is determined -by masking out the low order seven bits of the computed hash for the -packet (usually a Toeplitz hash), taking this number as a key into the -indirection table and reading the corresponding value. +by indexing the indirection table with the low order bits of the +computed hash for the packet (usually a Toeplitz hash). + +The indirection table helps even out the traffic distribution when queue +count is not a power of two. NICs should provide an indirection table +at least 4 times larger than the queue count. 4x table results in ~16% +imbalance between the queues, which is acceptable for most applications. Some NICs support symmetric RSS hashing where, if the IP (source address, destination address) and TCP/UDP (source port, destination port) tuples diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst index 7aabead90648..2162c4f2b28a 100644 --- a/Documentation/networking/timestamping.rst +++ b/Documentation/networking/timestamping.rst @@ -627,10 +627,9 @@ ioctl(SIOCSHWTSTAMP). However, this has not been implemented in all drivers. -------------------------------------------------------- A driver which supports hardware time stamping must support the -ndo_hwtstamp_set NDO or the legacy SIOCSHWTSTAMP ioctl and update the -supplied struct hwtstamp_config with the actual values as described in -the section on SIOCSHWTSTAMP. It should also support ndo_hwtstamp_get or -the legacy SIOCGHWTSTAMP. +ndo_hwtstamp_set NDO and update the supplied struct hwtstamp_config with +the actual values as described in the section on SIOCSHWTSTAMP. It +should also support ndo_hwtstamp_get NDO to retrieve configuration. Time stamps for received packets must be stored in the skb. To get a pointer to the shared time stamp structure of the skb call skb_hwtstamps(). Then diff --git a/Documentation/networking/tls-offload.rst b/Documentation/networking/tls-offload.rst index 7354d48cdf92..c173f537bf4d 100644 --- a/Documentation/networking/tls-offload.rst +++ b/Documentation/networking/tls-offload.rst @@ -318,6 +318,36 @@ is restarted. When the header is matched the device sends a confirmation request to the kernel, asking if the guessed location is correct (if a TLS record really starts there), and which record sequence number the given header had. + +The asynchronous resync process is coordinated on the kernel side using +struct tls_offload_resync_async, which tracks and manages the resync request. + +Helper functions to manage struct tls_offload_resync_async: + +``tls_offload_rx_resync_async_request_start()`` +Initializes an asynchronous resync attempt by specifying the sequence range to +monitor and resetting internal state in the struct. + +``tls_offload_rx_resync_async_request_end()`` +Retains the device's guessed TCP sequence number for comparison with current or +future logged ones. It also clears the RESYNC_REQ_ASYNC flag from the resync +request, indicating that the device has submitted its guessed sequence number. + +``tls_offload_rx_resync_async_request_cancel()`` +Cancels any in-progress resync attempt, clearing the request state. + +When the kernel processes an RX segment that begins a new TLS record, it +examines the current status of the asynchronous resynchronization request. + +If the device is still waiting to provide its guessed TCP sequence number +(the async state), the kernel records the sequence number of this segment so +that it can later be compared once the device's guess becomes available. + +If the device has already submitted its guessed sequence number (the non-async +state), the kernel now tries to match that guess against the sequence numbers of +all TLS record headers that have been logged since the resync request +started. + The kernel confirms the guessed location was correct and tells the device the record sequence number. Meanwhile, the device had been parsing and counting all records since the just-confirmed one, it adds the number |
