summaryrefslogtreecommitdiff
path: root/Documentation/networking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/6pack.rst2
-rw-r--r--Documentation/networking/arcnet-hardware.rst22
-rw-r--r--Documentation/networking/arcnet.rst48
-rw-r--r--Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst10
-rw-r--r--Documentation/networking/device_drivers/ethernet/index.rst1
-rw-r--r--Documentation/networking/device_drivers/ethernet/mucse/rnpgbe.rst17
-rw-r--r--Documentation/networking/devlink/devlink-eswitch-attr.rst13
-rw-r--r--Documentation/networking/devlink/devlink-params.rst14
-rw-r--r--Documentation/networking/devlink/i40e.rst34
-rw-r--r--Documentation/networking/devlink/index.rst1
-rw-r--r--Documentation/networking/devlink/mlx5.rst14
-rw-r--r--Documentation/networking/devlink/stmmac.rst40
-rw-r--r--Documentation/networking/dsa/dsa.rst17
-rw-r--r--Documentation/networking/ethtool-netlink.rst64
-rw-r--r--Documentation/networking/index.rst5
-rw-r--r--Documentation/networking/ip-sysctl.rst60
-rw-r--r--Documentation/networking/napi.rst50
-rw-r--r--Documentation/networking/net_cachelines/inet_connection_sock.rst2
-rw-r--r--Documentation/networking/net_cachelines/inet_sock.rst79
-rw-r--r--Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst3
-rw-r--r--Documentation/networking/netconsole.rst2
-rw-r--r--Documentation/networking/nfc.rst6
-rw-r--r--Documentation/networking/smc-sysctl.rst40
-rw-r--r--Documentation/networking/statistics.rst4
-rw-r--r--Documentation/networking/tls.rst20
-rw-r--r--Documentation/networking/xfrm/index.rst13
-rw-r--r--Documentation/networking/xfrm/xfrm_device.rst (renamed from Documentation/networking/xfrm_device.rst)20
-rw-r--r--Documentation/networking/xfrm/xfrm_proc.rst (renamed from Documentation/networking/xfrm_proc.rst)0
-rw-r--r--Documentation/networking/xfrm/xfrm_sync.rst (renamed from Documentation/networking/xfrm_sync.rst)97
-rw-r--r--Documentation/networking/xfrm/xfrm_sysctl.rst (renamed from Documentation/networking/xfrm_sysctl.rst)4
30 files changed, 536 insertions, 166 deletions
diff --git a/Documentation/networking/6pack.rst b/Documentation/networking/6pack.rst
index bc5bf1f1a98f..66d5fd4fc821 100644
--- a/Documentation/networking/6pack.rst
+++ b/Documentation/networking/6pack.rst
@@ -94,7 +94,7 @@ kernels may lead to a compilation error because the interface to a kernel
function has been changed in the 2.1.8x kernels.
How to turn on 6pack support:
-=============================
+-----------------------------
- In the linux kernel configuration program, select the code maturity level
options menu and turn on the prompting for development drivers.
diff --git a/Documentation/networking/arcnet-hardware.rst b/Documentation/networking/arcnet-hardware.rst
index 3bf7f99cd7bb..20e5075d0d0e 100644
--- a/Documentation/networking/arcnet-hardware.rst
+++ b/Documentation/networking/arcnet-hardware.rst
@@ -4,18 +4,20 @@
ARCnet Hardware
===============
+:Author: Avery Pennarun <apenwarr@worldvisions.ca>
+
.. note::
- 1) This file is a supplement to arcnet.txt. Please read that for general
+ 1) This file is a supplement to arcnet.rst. Please read that for general
driver configuration help.
2) This file is no longer Linux-specific. It should probably be moved out
of the kernel sources. Ideas?
Because so many people (myself included) seem to have obtained ARCnet cards
without manuals, this file contains a quick introduction to ARCnet hardware,
-some cabling tips, and a listing of all jumper settings I can find. Please
-e-mail apenwarr@worldvisions.ca with any settings for your particular card,
-or any other information you have!
+some cabling tips, and a listing of all jumper settings I can find. If you
+have any settings for your particular card, and/or any other information you
+have, do not hesitate to :ref:`email to netdev <arcnet-netdev>`.
Introduction to ARCnet
@@ -72,11 +74,10 @@ level of encapsulation is defined by RFC1201, which I call "packet
splitting," that allows "virtual packets" to grow as large as 64K each,
although they are generally kept down to the Ethernet-style 1500 bytes.
-For more information on the advantages and disadvantages (mostly the
-advantages) of ARCnet networks, you might try the "ARCnet Trade Association"
-WWW page:
+For more information on ARCnet networks, visit the "ARCNET Resource Center"
+WWW page at:
- http://www.arcnet.com
+ https://www.arcnet.cc
Cabling ARCnet Networks
@@ -3226,9 +3227,6 @@ Settings for IRQ Selection (Lower Jumper Line)
Other Cards
===========
-I have no information on other models of ARCnet cards at the moment. Please
-send any and all info to:
-
- apenwarr@worldvisions.ca
+I have no information on other models of ARCnet cards at the moment.
Thanks.
diff --git a/Documentation/networking/arcnet.rst b/Documentation/networking/arcnet.rst
index 82fce606c0f0..cd43a18ad149 100644
--- a/Documentation/networking/arcnet.rst
+++ b/Documentation/networking/arcnet.rst
@@ -4,6 +4,8 @@
ARCnet
======
+:Author: Avery Pennarun <apenwarr@worldvisions.ca>
+
.. note::
See also arcnet-hardware.txt in this directory for jumper-setting
@@ -30,18 +32,7 @@ Come on, be a sport! Send me a success report!
(hey, that was even better than my original poem... this is getting bad!)
-
-.. warning::
-
- If you don't e-mail me about your success/failure soon, I may be forced to
- start SINGING. And we don't want that, do we?
-
- (You know, it might be argued that I'm pushing this point a little too much.
- If you think so, why not flame me in a quick little e-mail? Please also
- include the type of card(s) you're using, software, size of network, and
- whether it's working or not.)
-
- My e-mail address is: apenwarr@worldvisions.ca
+----
These are the ARCnet drivers for Linux.
@@ -59,23 +50,14 @@ ARCnet 2.10 ALPHA, Tomasz's all-new-and-improved RFC1051 support has been
included and seems to be working fine!
+.. _arcnet-netdev:
+
Where do I discuss these drivers?
---------------------------------
-Tomasz has been so kind as to set up a new and improved mailing list.
-Subscribe by sending a message with the BODY "subscribe linux-arcnet YOUR
-REAL NAME" to listserv@tichy.ch.uj.edu.pl. Then, to submit messages to the
-list, mail to linux-arcnet@tichy.ch.uj.edu.pl.
-
-There are archives of the mailing list at:
-
- http://epistolary.org/mailman/listinfo.cgi/arcnet
-
-The people on linux-net@vger.kernel.org (now defunct, replaced by
-netdev@vger.kernel.org) have also been known to be very helpful, especially
-when we're talking about ALPHA Linux kernels that may or may not work right
-in the first place.
-
+ARCnet discussions take place on netdev. Simply send your email to
+netdev@vger.kernel.org and make sure to Cc: maintainer listed in
+"ARCNET NETWORK LAYER" heading of Documentation/process/maintainers.rst.
Other Drivers and Info
----------------------
@@ -523,17 +505,9 @@ can set up your network then:
It works: what now?
-------------------
-Send mail describing your setup, preferably including driver version, kernel
-version, ARCnet card model, CPU type, number of systems on your network, and
-list of software in use to me at the following address:
-
- apenwarr@worldvisions.ca
-
-I do send (sometimes automated) replies to all messages I receive. My email
-can be weird (and also usually gets forwarded all over the place along the
-way to me), so if you don't get a reply within a reasonable time, please
-resend.
-
+Send mail following :ref:`arcnet-netdev`. Describe your setup, preferably
+including driver version, kernel version, ARCnet card model, CPU type, number
+of systems on your network, and list of software in use.
It doesn't work: what now?
--------------------------
diff --git a/Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst b/Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst
index 6877a3260582..5aedbabb7382 100644
--- a/Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst
+++ b/Documentation/networking/device_drivers/cellular/qualcomm/rmnet.rst
@@ -28,6 +28,7 @@ these MAP frames and send them to appropriate PDN's.
================
a. MAP packet v1 (data / control)
+---------------------------------
MAP header fields are in big endian format.
@@ -54,6 +55,7 @@ Payload length includes the padding length but does not include MAP header
length.
b. Map packet v4 (data / control)
+---------------------------------
MAP header fields are in big endian format.
@@ -107,6 +109,7 @@ over which checksum is computed.
Checksum value, indicates the checksum computed.
c. MAP packet v5 (data / control)
+---------------------------------
MAP header fields are in big endian format.
@@ -134,6 +137,7 @@ Payload length includes the padding length but does not include MAP header
length.
d. Checksum offload header v5
+-----------------------------
Checksum offload header fields are in big endian format.
@@ -158,7 +162,10 @@ indicates that the calculated packet checksum is invalid.
Reserved bits must be zero when sent and ignored when received.
-e. MAP packet v1/v5 (command specific)::
+e. MAP packet v1/v5 (command specific)
+--------------------------------------
+
+Packet format::
Bit 0 1 2-7 8 - 15 16 - 31
Function Command Reserved Pad Multiplexer ID Payload length
@@ -181,6 +188,7 @@ Command types
= ==========================================
f. Aggregation
+--------------
Aggregation is multiple MAP packets (can be data or command) delivered to
rmnet in a single linear skb. rmnet will process the individual
diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst
index 7cfcd183054f..bcc02355f828 100644
--- a/Documentation/networking/device_drivers/ethernet/index.rst
+++ b/Documentation/networking/device_drivers/ethernet/index.rst
@@ -47,6 +47,7 @@ Contents:
mellanox/mlx5/index
meta/fbnic
microsoft/netvsc
+ mucse/rnpgbe
neterion/s2io
netronome/nfp
pensando/ionic
diff --git a/Documentation/networking/device_drivers/ethernet/mucse/rnpgbe.rst b/Documentation/networking/device_drivers/ethernet/mucse/rnpgbe.rst
new file mode 100644
index 000000000000..d35cf8a46b6c
--- /dev/null
+++ b/Documentation/networking/device_drivers/ethernet/mucse/rnpgbe.rst
@@ -0,0 +1,17 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================================
+Linux Base Driver for MUCSE(R) Gigabit PCI Express Adapters
+===========================================================
+
+Contents
+========
+
+- Identifying Your Adapter
+
+Identifying Your Adapter
+========================
+The driver is compatible with devices based on the following:
+
+ * MUCSE(R) Ethernet Controller N210 series
+ * MUCSE(R) Ethernet Controller N500 series
diff --git a/Documentation/networking/devlink/devlink-eswitch-attr.rst b/Documentation/networking/devlink/devlink-eswitch-attr.rst
index 08bb39ab1528..eafe09abc40c 100644
--- a/Documentation/networking/devlink/devlink-eswitch-attr.rst
+++ b/Documentation/networking/devlink/devlink-eswitch-attr.rst
@@ -39,6 +39,10 @@ The following is a list of E-Switch attributes.
rules.
* ``switchdev`` allows for more advanced offloading capabilities of
the E-Switch to hardware.
+ * ``switchdev_inactive`` switchdev mode but starts inactive, doesn't allow traffic
+ until explicitly activated. This mode is useful for orchestrators that
+ want to prepare the device in switchdev mode but only activate it when
+ all configurations are done.
* - ``inline-mode``
- enum
- Some HWs need the VF driver to put part of the packet
@@ -74,3 +78,12 @@ Example Usage
# enable encap-mode with legacy mode
$ devlink dev eswitch set pci/0000:08:00.0 mode legacy inline-mode none encap-mode basic
+
+ # start switchdev mode in inactive state
+ $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev_inactive
+
+ # setup switchdev configurations, representors, FDB entries, etc..
+ ...
+
+ # activate switchdev mode to allow traffic
+ $ devlink dev eswitch set pci/0000:08:00.0 mode switchdev
diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst
index 0a9c20d70122..ea17756dcda6 100644
--- a/Documentation/networking/devlink/devlink-params.rst
+++ b/Documentation/networking/devlink/devlink-params.rst
@@ -41,6 +41,16 @@ In order for ``driverinit`` parameters to take effect, the driver must
support reloading via the ``devlink-reload`` command. This command will
request a reload of the device driver.
+Default parameter values
+=========================
+
+Drivers may optionally export default values for parameters of cmode
+``runtime`` and ``permanent``. For ``driverinit`` parameters, the last
+value set by the driver will be used as the default value. Drivers can
+also support resetting params with cmode ``runtime`` and ``permanent``
+to their default values. Resetting ``driverinit`` params is supported
+by devlink core without additional driver support needed.
+
.. _devlink_params_generic:
Generic configuration parameters
@@ -151,3 +161,7 @@ own name.
* - ``num_doorbells``
- u32
- Controls the number of doorbells used by the device.
+ * - ``max_mac_per_vf``
+ - u32
+ - Controls the maximum number of MAC address filters that can be assigned
+ to a Virtual Function (VF).
diff --git a/Documentation/networking/devlink/i40e.rst b/Documentation/networking/devlink/i40e.rst
index d3cb5bb5197e..51c887f0dc83 100644
--- a/Documentation/networking/devlink/i40e.rst
+++ b/Documentation/networking/devlink/i40e.rst
@@ -7,6 +7,40 @@ i40e devlink support
This document describes the devlink features implemented by the ``i40e``
device driver.
+Parameters
+==========
+
+.. list-table:: Generic parameters implemented
+ :widths: 5 5 90
+
+ * - Name
+ - Mode
+ - Notes
+ * - ``max_mac_per_vf``
+ - runtime
+ - Controls the maximum number of MAC addresses a VF can use
+ on i40e devices.
+
+ By default (``0``), the driver enforces its internally calculated per-VF
+ MAC filter limit, which is based on the number of allocated VFS.
+
+ If set to a non-zero value, this parameter acts as a strict cap:
+ the driver will use the user-provided value instead of its internal
+ calculation.
+
+ **Important notes:**
+
+ - This value **must be set before enabling SR-IOV**.
+ Attempting to change it while SR-IOV is enabled will return an error.
+ - MAC filters are a **shared hardware resource** across all VFs.
+ Setting a high value may cause other VFs to be starved of filters.
+ - This value is a **Administrative policy**. The hardware may return
+ errors when its absolute limit is reached, regardless of the value
+ set here.
+
+ The default value is ``0`` (internal calculation is used).
+
+
Info versions
=============
diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
index 0c58e5c729d9..35b12a2bfeba 100644
--- a/Documentation/networking/devlink/index.rst
+++ b/Documentation/networking/devlink/index.rst
@@ -99,5 +99,6 @@ parameters, info versions, and other features it supports.
prestera
qed
sfc
+ stmmac
ti-cpsw-switch
zl3073x
diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index 0e5f9c76e514..4bba4d780a4a 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -218,6 +218,20 @@ parameters.
* ``balanced`` : Merges fewer CQEs, resulting in a moderate compression ratio but maintaining a balance between bandwidth savings and performance
* ``aggressive`` : Merges more CQEs into a single entry, achieving a higher compression rate and maximizing performance, particularly under high traffic loads
+ * - ``swp_l4_csum_mode``
+ - string
+ - permanent
+ - Configure how the L4 checksum is calculated by the device when using
+ Software Parser (SWP) hints for header locations.
+
+ * ``default`` : Use the device's default checksum calculation
+ mode. The driver will discover during init whether or
+ full_csum or l4_only is in use. Setting this value explicitly
+ from userspace is not allowed, but some firmware versions may
+ return this value on param read.
+ * ``full_csum`` : Calculate full checksum including the pseudo-header
+ * ``l4_only`` : Calculate L4-only checksum, excluding the pseudo-header
+
The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
Info versions
diff --git a/Documentation/networking/devlink/stmmac.rst b/Documentation/networking/devlink/stmmac.rst
new file mode 100644
index 000000000000..47e3ff10bc08
--- /dev/null
+++ b/Documentation/networking/devlink/stmmac.rst
@@ -0,0 +1,40 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+stmmac (synopsys dwmac) devlink support
+=======================================
+
+This document describes the devlink features implemented by the ``stmmac``
+device driver.
+
+Parameters
+==========
+
+The ``stmmac`` driver implements the following driver-specific parameters.
+
+.. list-table:: Driver-specific parameters implemented
+ :widths: 5 5 5 85
+
+ * - Name
+ - Type
+ - Mode
+ - Description
+ * - ``phc_coarse_adj``
+ - Boolean
+ - runtime
+ - Enable the Coarse timestamping mode, as defined in the DWMAC TRM.
+ A detailed explanation of this timestamping mode can be found in the
+ Socfpga Functionnal Description [1].
+
+ In Coarse mode, the ptp clock is expected to be fed by a high-precision
+ clock that is externally adjusted, and the subsecond increment used for
+ timestamping is set to 1/ptp_clock_rate.
+
+ In Fine mode (i.e. Coarse mode == false), the ptp clock frequency is
+ continuously adjusted, but the subsecond increment is set to
+ 2/ptp_clock_rate.
+
+ Coarse mode is suitable for PTP Grand Master operation. If unsure, leave
+ the parameter to False.
+
+ [1] https://www.intel.com/content/www/us/en/docs/programmable/683126/21-2/functional-description-of-the-emac.html
diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst
index 7b2e69cd7ef0..5c79740a533b 100644
--- a/Documentation/networking/dsa/dsa.rst
+++ b/Documentation/networking/dsa/dsa.rst
@@ -1104,12 +1104,11 @@ health of the network and for discovery of other nodes.
In Linux, both HSR and PRP are implemented in the hsr driver, which
instantiates a virtual, stackable network interface with two member ports.
The driver only implements the basic roles of DANH (Doubly Attached Node
-implementing HSR) and DANP (Doubly Attached Node implementing PRP); the roles
-of RedBox and QuadBox are not implemented (therefore, bridging a hsr network
-interface with a physical switch port does not produce the expected result).
+implementing HSR), DANP (Doubly Attached Node implementing PRP) and RedBox
+(allows non-HSR devices to connect to the ring via Interlink ports).
-A driver which is able of offloading certain functions of a DANP or DANH should
-declare the corresponding netdev features as indicated by the documentation at
+A driver which is able of offloading certain functions should declare the
+corresponding netdev features as indicated by the documentation at
``Documentation/networking/netdev-features.rst``. Additionally, the following
methods must be implemented:
@@ -1120,6 +1119,14 @@ methods must be implemented:
- ``port_hsr_leave``: function invoked when a given switch port leaves a
DANP/DANH and returns to normal operation as a standalone port.
+Note that the ``NETIF_F_HW_HSR_DUP`` feature relies on transmission towards
+multiple ports, which is generally available whenever the tagging protocol uses
+the ``dsa_xmit_port_mask()`` helper function. If the helper is used, the HSR
+offload feature should also be set. The ``dsa_port_simple_hsr_join()`` and
+``dsa_port_simple_hsr_leave()`` methods can be used as generic implementations
+of ``port_hsr_join`` and ``port_hsr_leave``, if this is the only supported
+offload feature.
+
TODO
====
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index b270886c5f5d..af56c304cef4 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -242,6 +242,7 @@ Userspace to kernel:
``ETHTOOL_MSG_RSS_SET`` set RSS settings
``ETHTOOL_MSG_RSS_CREATE_ACT`` create an additional RSS context
``ETHTOOL_MSG_RSS_DELETE_ACT`` delete an additional RSS context
+ ``ETHTOOL_MSG_MSE_GET`` get MSE diagnostic data
===================================== =================================
Kernel to userspace:
@@ -299,6 +300,7 @@ Kernel to userspace:
``ETHTOOL_MSG_RSS_CREATE_ACT_REPLY`` create an additional RSS context
``ETHTOOL_MSG_RSS_CREATE_NTF`` additional RSS context created
``ETHTOOL_MSG_RSS_DELETE_NTF`` additional RSS context deleted
+ ``ETHTOOL_MSG_MSE_GET_REPLY`` MSE diagnostic data
======================================== =================================
``GET`` requests are sent by userspace applications to retrieve device
@@ -2458,6 +2460,68 @@ Kernel response contents:
For a description of each attribute, see ``TSCONFIG_GET``.
+MSE_GET
+=======
+
+Retrieves detailed Mean Square Error (MSE) diagnostic information from the PHY.
+
+Request Contents:
+
+ ==================================== ====== ============================
+ ``ETHTOOL_A_MSE_HEADER`` nested request header
+ ==================================== ====== ============================
+
+Kernel Response Contents:
+
+ ==================================== ====== ================================
+ ``ETHTOOL_A_MSE_HEADER`` nested reply header
+ ``ETHTOOL_A_MSE_CAPABILITIES`` nested capability/scale info for MSE
+ measurements
+ ``ETHTOOL_A_MSE_CHANNEL_A`` nested snapshot for Channel A
+ ``ETHTOOL_A_MSE_CHANNEL_B`` nested snapshot for Channel B
+ ``ETHTOOL_A_MSE_CHANNEL_C`` nested snapshot for Channel C
+ ``ETHTOOL_A_MSE_CHANNEL_D`` nested snapshot for Channel D
+ ``ETHTOOL_A_MSE_WORST_CHANNEL`` nested snapshot for worst channel
+ ``ETHTOOL_A_MSE_LINK`` nested snapshot for link-wide aggregate
+ ==================================== ====== ================================
+
+MSE Capabilities
+----------------
+
+This nested attribute reports the capability / scaling properties used to
+interpret snapshot values.
+
+ ============================================== ====== =========================
+ ``ETHTOOL_A_MSE_CAPABILITIES_MAX_AVERAGE_MSE`` uint max avg_mse scale
+ ``ETHTOOL_A_MSE_CAPABILITIES_MAX_PEAK_MSE`` uint max peak_mse scale
+ ``ETHTOOL_A_MSE_CAPABILITIES_REFRESH_RATE_PS`` uint sample rate (picoseconds)
+ ``ETHTOOL_A_MSE_CAPABILITIES_NUM_SYMBOLS`` uint symbols per HW sample
+ ============================================== ====== =========================
+
+The max-average/peak fields are included only if the corresponding metric
+is supported by the PHY. Their absence indicates that the metric is not
+available.
+
+See ``struct phy_mse_capability`` kernel documentation in
+``include/linux/phy.h``.
+
+MSE Snapshot
+------------
+
+Each per-channel nest contains an atomic snapshot of MSE values for that
+selector (channel A/B/C/D, worst channel, or link).
+
+ ========================================== ====== ===================
+ ``ETHTOOL_A_MSE_SNAPSHOT_AVERAGE_MSE`` uint average MSE value
+ ``ETHTOOL_A_MSE_SNAPSHOT_PEAK_MSE`` uint current peak MSE
+ ``ETHTOOL_A_MSE_SNAPSHOT_WORST_PEAK_MSE`` uint worst-case peak MSE
+ ========================================== ====== ===================
+
+Within each channel nest, only the metrics supported by the PHY will be present.
+
+See ``struct phy_mse_snapshot`` kernel documentation in
+``include/linux/phy.h``.
+
Request translation
===================
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index c775cababc8c..75db2251649b 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -131,10 +131,7 @@ Contents:
vxlan
x25
x25-iface
- xfrm_device
- xfrm_proc
- xfrm_sync
- xfrm_sysctl
+ xfrm/index
xdp-rx-metadata
xsk-tx-metadata
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index a06cb99d66dc..bc9a01606daf 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -673,6 +673,16 @@ tcp_moderate_rcvbuf - BOOLEAN
Default: 1 (enabled)
+tcp_rcvbuf_low_rtt - INTEGER
+ rcvbuf autotuning can over estimate final socket rcvbuf, which
+ can lead to cache trashing for high throughput flows.
+
+ For small RTT flows (below tcp_rcvbuf_low_rtt usecs), we can relax
+ rcvbuf growth: Few additional ms to reach the final (and smaller)
+ rcvbuf is a good tradeoff.
+
+ Default : 1000 (1 ms)
+
tcp_mtu_probing - INTEGER
Controls TCP Packetization-Layer Path MTU Discovery. Takes three
values:
@@ -854,9 +864,18 @@ tcp_sack - BOOLEAN
Default: 1 (enabled)
+tcp_comp_sack_rtt_percent - INTEGER
+ Percentage of SRTT used for the compressed SACK feature.
+ See tcp_comp_sack_nr, tcp_comp_sack_delay_ns, tcp_comp_sack_slack_ns.
+
+ Possible values : 1 - 1000
+
+ Default : 33 %
+
tcp_comp_sack_delay_ns - LONG INTEGER
- TCP tries to reduce number of SACK sent, using a timer
- based on 5% of SRTT, capped by this sysctl, in nano seconds.
+ TCP tries to reduce number of SACK sent, using a timer based
+ on tcp_comp_sack_rtt_percent of SRTT, capped by this sysctl
+ in nano seconds.
The default is 1ms, based on TSO autosizing period.
Default : 1,000,000 ns (1 ms)
@@ -866,8 +885,9 @@ tcp_comp_sack_slack_ns - LONG INTEGER
timer used by SACK compression. This gives extra time
for small RTT flows, and reduces system overhead by allowing
opportunistic reduction of timer interrupts.
+ Too big values might reduce goodput.
- Default : 100,000 ns (100 us)
+ Default : 10,000 ns (10 us)
tcp_comp_sack_nr - INTEGER
Max number of SACK that can be compressed.
@@ -1796,6 +1816,23 @@ icmp_errors_use_inbound_ifaddr - BOOLEAN
Default: 0 (disabled)
+icmp_errors_extension_mask - UNSIGNED INTEGER
+ Bitmask of ICMP extensions to append to ICMPv4 error messages
+ ("Destination Unreachable", "Time Exceeded" and "Parameter Problem").
+ The original datagram is trimmed / padded to 128 bytes in order to be
+ compatible with applications that do not comply with RFC 4884.
+
+ Possible extensions are:
+
+ ==== ==============================================================
+ 0x01 Incoming IP interface information according to RFC 5837.
+ Extension will include the index, IPv4 address (if present),
+ name and MTU of the IP interface that received the datagram
+ which elicited the ICMP error.
+ ==== ==============================================================
+
+ Default: 0x00 (no extensions)
+
igmp_max_memberships - INTEGER
Change the maximum number of multicast groups we can subscribe to.
Default: 20
@@ -3262,6 +3299,23 @@ error_anycast_as_unicast - BOOLEAN
Default: 0 (disabled)
+errors_extension_mask - UNSIGNED INTEGER
+ Bitmask of ICMP extensions to append to ICMPv6 error messages
+ ("Destination Unreachable" and "Time Exceeded"). The original datagram
+ is trimmed / padded to 128 bytes in order to be compatible with
+ applications that do not comply with RFC 4884.
+
+ Possible extensions are:
+
+ ==== ==============================================================
+ 0x01 Incoming IP interface information according to RFC 5837.
+ Extension will include the index, IPv6 address (if present),
+ name and MTU of the IP interface that received the datagram
+ which elicited the ICMP error.
+ ==== ==============================================================
+
+ Default: 0x00 (no extensions)
+
xfrm6_gc_thresh - INTEGER
(Obsolete since linux-4.14)
The threshold at which we will start garbage collecting for IPv6
diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst
index 7dd60366f4ff..4e008efebb35 100644
--- a/Documentation/networking/napi.rst
+++ b/Documentation/networking/napi.rst
@@ -263,7 +263,9 @@ are not well known).
Busy polling is enabled by either setting ``SO_BUSY_POLL`` on
selected sockets or using the global ``net.core.busy_poll`` and
``net.core.busy_read`` sysctls. An io_uring API for NAPI busy polling
-also exists.
+also exists. Threaded polling of NAPI also has a mode to busy poll for
+packets (:ref:`threaded busy polling<threaded_busy_poll>`) using the NAPI
+processing kthread.
epoll-based busy polling
------------------------
@@ -426,6 +428,52 @@ Therefore, setting ``gro_flush_timeout`` and ``napi_defer_hard_irqs`` is
the recommended usage, because otherwise setting ``irq-suspend-timeout``
might not have any discernible effect.
+.. _threaded_busy_poll:
+
+Threaded NAPI busy polling
+--------------------------
+
+Threaded NAPI busy polling extends threaded NAPI and adds support to do
+continuous busy polling of the NAPI. This can be useful for forwarding or
+AF_XDP applications.
+
+Threaded NAPI busy polling can be enabled on per NIC queue basis using Netlink.
+
+For example, using the following script:
+
+.. code-block:: bash
+
+ $ ynl --family netdev --do napi-set \
+ --json='{"id": 66, "threaded": "busy-poll"}'
+
+The kernel will create a kthread that busy polls on this NAPI.
+
+The user may elect to set the CPU affinity of this kthread to an unused CPU
+core to improve how often the NAPI is polled at the expense of wasted CPU
+cycles. Note that this will keep the CPU core busy with 100% usage.
+
+Once threaded busy polling is enabled for a NAPI, PID of the kthread can be
+retrieved using Netlink so the affinity of the kthread can be set up.
+
+For example, the following script can be used to fetch the PID:
+
+.. code-block:: bash
+
+ $ ynl --family netdev --do napi-get --json='{"id": 66}'
+
+This will output something like following, the pid `258` is the PID of the
+kthread that is polling this NAPI.
+
+.. code-block:: bash
+
+ $ {'defer-hard-irqs': 0,
+ 'gro-flush-timeout': 0,
+ 'id': 66,
+ 'ifindex': 2,
+ 'irq-suspend-timeout': 0,
+ 'pid': 258,
+ 'threaded': 'busy-poll'}
+
.. _threaded:
Threaded NAPI
diff --git a/Documentation/networking/net_cachelines/inet_connection_sock.rst b/Documentation/networking/net_cachelines/inet_connection_sock.rst
index 8fae85ebb773..cc2000f55c29 100644
--- a/Documentation/networking/net_cachelines/inet_connection_sock.rst
+++ b/Documentation/networking/net_cachelines/inet_connection_sock.rst
@@ -12,8 +12,8 @@ struct inet_sock icsk_inet read_mostly r
struct request_sock_queue icsk_accept_queue
struct inet_bind_bucket icsk_bind_hash read_mostly tcp_set_state
struct inet_bind2_bucket icsk_bind2_hash read_mostly tcp_set_state,inet_put_port
-struct timer_list icsk_retransmit_timer read_write inet_csk_reset_xmit_timer,tcp_connect
struct timer_list icsk_delack_timer read_mostly inet_csk_reset_xmit_timer,tcp_connect
+struct timer_list icsk_keepalive_timer
u32 icsk_rto read_write tcp_cwnd_validate,tcp_schedule_loss_probe,tcp_connect_init,tcp_connect,tcp_write_xmit,tcp_push_one
u32 icsk_rto_min
u32 icsk_rto_max read_mostly tcp_reset_xmit_timer
diff --git a/Documentation/networking/net_cachelines/inet_sock.rst b/Documentation/networking/net_cachelines/inet_sock.rst
index b11bf48fa2b3..4c72a28a7012 100644
--- a/Documentation/networking/net_cachelines/inet_sock.rst
+++ b/Documentation/networking/net_cachelines/inet_sock.rst
@@ -5,42 +5,43 @@
inet_sock struct fast path usage breakdown
==========================================
-======================= ===================== =================== =================== ======================================================================================================
-Type Name fastpath_tx_access fastpath_rx_access comment
-======================= ===================== =================== =================== ======================================================================================================
-struct sock sk read_mostly read_mostly tcp_init_buffer_space,tcp_init_transfer,tcp_finish_connect,tcp_connect,tcp_send_rcvq,tcp_send_syn_data
-struct ipv6_pinfo* pinet6
-be16 inet_sport read_mostly __tcp_transmit_skb
-be32 inet_daddr read_mostly ip_select_ident_segs
-be32 inet_rcv_saddr
-be16 inet_dport read_mostly __tcp_transmit_skb
-u16 inet_num
-be32 inet_saddr
-s16 uc_ttl read_mostly __ip_queue_xmit/ip_select_ttl
-u16 cmsg_flags
-struct ip_options_rcu* inet_opt read_mostly __ip_queue_xmit
-u16 inet_id read_mostly ip_select_ident_segs
-u8 tos read_mostly ip_queue_xmit
-u8 min_ttl
-u8 mc_ttl
-u8 pmtudisc
-u8:1 recverr
-u8:1 is_icsk
-u8:1 freebind
-u8:1 hdrincl
-u8:1 mc_loop
-u8:1 transparent
-u8:1 mc_all
-u8:1 nodefrag
-u8:1 bind_address_no_port
-u8:1 recverr_rfc4884
-u8:1 defer_connect read_mostly tcp_sendmsg_fastopen
-u8 rcv_tos
-u8 convert_csum
-int uc_index
-int mc_index
-be32 mc_addr
-struct ip_mc_socklist* mc_list
-struct inet_cork_full cork read_mostly __tcp_transmit_skb
-struct local_port_range
-======================= ===================== =================== =================== ======================================================================================================
+======================== ===================== =================== =================== ======================================================================================================
+Type Name fastpath_tx_access fastpath_rx_access comment
+======================== ===================== =================== =================== ======================================================================================================
+struct sock sk read_mostly read_mostly tcp_init_buffer_space,tcp_init_transfer,tcp_finish_connect,tcp_connect,tcp_send_rcvq,tcp_send_syn_data
+struct ipv6_pinfo* pinet6
+struct ipv6_fl_socklist* ipv6_fl_list read_mostly tcp_v6_connect,__ip6_datagram_connect,udpv6_sendmsg,rawv6_sendmsg
+be16 inet_sport read_mostly __tcp_transmit_skb
+be32 inet_daddr read_mostly ip_select_ident_segs
+be32 inet_rcv_saddr
+be16 inet_dport read_mostly __tcp_transmit_skb
+u16 inet_num
+be32 inet_saddr
+s16 uc_ttl read_mostly __ip_queue_xmit/ip_select_ttl
+u16 cmsg_flags
+struct ip_options_rcu* inet_opt read_mostly __ip_queue_xmit
+u16 inet_id read_mostly ip_select_ident_segs
+u8 tos read_mostly ip_queue_xmit
+u8 min_ttl
+u8 mc_ttl
+u8 pmtudisc
+u8:1 recverr
+u8:1 is_icsk
+u8:1 freebind
+u8:1 hdrincl
+u8:1 mc_loop
+u8:1 transparent
+u8:1 mc_all
+u8:1 nodefrag
+u8:1 bind_address_no_port
+u8:1 recverr_rfc4884
+u8:1 defer_connect read_mostly tcp_sendmsg_fastopen
+u8 rcv_tos
+u8 convert_csum
+int uc_index
+int mc_index
+be32 mc_addr
+struct ip_mc_socklist* mc_list
+struct inet_cork_full cork read_mostly __tcp_transmit_skb
+struct local_port_range
+======================== ===================== =================== =================== ======================================================================================================
diff --git a/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst b/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
index 6e7b20afd2d4..beaf1880a19b 100644
--- a/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
+++ b/Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
@@ -102,7 +102,8 @@ u8 sysctl_tcp_app_win
u8 sysctl_tcp_frto tcp_enter_loss
u8 sysctl_tcp_nometrics_save TCP_LAST_ACK/tcp_update_metrics
u8 sysctl_tcp_no_ssthresh_metrics_save TCP_LAST_ACK/tcp_(update/init)_metrics
-u8 sysctl_tcp_moderate_rcvbuf read_mostly read_mostly tcp_tso_should_defer(tx);tcp_rcv_space_adjust(rx)
+u8 sysctl_tcp_moderate_rcvbuf read_mostly tcp_rcvbuf_grow()
+u32 sysctl_tcp_rcvbuf_low_rtt read_mostly tcp_rcvbuf_grow()
u8 sysctl_tcp_tso_win_divisor read_mostly tcp_tso_should_defer(tcp_write_xmit)
u8 sysctl_tcp_workaround_signed_windows tcp_select_window
int sysctl_tcp_limit_output_bytes read_mostly tcp_small_queue_check(tcp_write_xmit)
diff --git a/Documentation/networking/netconsole.rst b/Documentation/networking/netconsole.rst
index 2555e75e5cc1..4ab5d7b05cf1 100644
--- a/Documentation/networking/netconsole.rst
+++ b/Documentation/networking/netconsole.rst
@@ -88,7 +88,7 @@ for example:
nc -u -l -p <port>' / 'nc -u -l <port>
- or::
+ or::
netcat -u -l -p <port>' / 'netcat -u -l <port>
diff --git a/Documentation/networking/nfc.rst b/Documentation/networking/nfc.rst
index 9aab3a88c9b2..401735006143 100644
--- a/Documentation/networking/nfc.rst
+++ b/Documentation/networking/nfc.rst
@@ -71,7 +71,8 @@ Userspace interface
The userspace interface is divided in control operations and low-level data
exchange operation.
-CONTROL OPERATIONS:
+Control operations
+------------------
Generic netlink is used to implement the interface to the control operations.
The operations are composed by commands and events, all listed below:
@@ -100,7 +101,8 @@ relevant information such as the supported NFC protocols.
All polling operations requested through one netlink socket are stopped when
it's closed.
-LOW-LEVEL DATA EXCHANGE:
+Low-level data exchange
+-----------------------
The userspace must use PF_NFC sockets to perform any data communication with
targets. All NFC sockets use AF_NFC::
diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networking/smc-sysctl.rst
index a874d007f2db..904a910f198e 100644
--- a/Documentation/networking/smc-sysctl.rst
+++ b/Documentation/networking/smc-sysctl.rst
@@ -71,3 +71,43 @@ smcr_max_conns_per_lgr - INTEGER
acceptable value ranges from 16 to 255. Only for SMC-R v2.1 and later.
Default: 255
+
+smcr_max_send_wr - INTEGER
+ So-called work request buffers are SMCR link (and RDMA queue pair) level
+ resources necessary for performing RDMA operations. Since up to 255
+ connections can share a link group and thus also a link and the number
+ of the work request buffers is decided when the link is allocated,
+ depending on the workload it can be a bottleneck in a sense that threads
+ have to wait for work request buffers to become available. Before the
+ introduction of this control the maximal number of work request buffers
+ available on the send path used to be hard coded to 16. With this control
+ it becomes configurable. The acceptable range is between 2 and 2048.
+
+ Please be aware that all the buffers need to be allocated as a physically
+ continuous array in which each element is a single buffer and has the size
+ of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails, we keep retrying
+ with half of the buffer count until it is ether successful or (unlikely)
+ we dip below the old hard coded value which is 16 where we give up much
+ like before having this control.
+
+ Default: 16
+
+smcr_max_recv_wr - INTEGER
+ So-called work request buffers are SMCR link (and RDMA queue pair) level
+ resources necessary for performing RDMA operations. Since up to 255
+ connections can share a link group and thus also a link and the number
+ of the work request buffers is decided when the link is allocated,
+ depending on the workload it can be a bottleneck in a sense that threads
+ have to wait for work request buffers to become available. Before the
+ introduction of this control the maximal number of work request buffers
+ available on the receive path used to be hard coded to 16. With this control
+ it becomes configurable. The acceptable range is between 2 and 2048.
+
+ Please be aware that all the buffers need to be allocated as a physically
+ continuous array in which each element is a single buffer and has the size
+ of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails, we keep retrying
+ with half of the buffer count until it is ether successful or (unlikely)
+ we dip below the old hard coded value which is 16 where we give up much
+ like before having this control.
+
+ Default: 48
diff --git a/Documentation/networking/statistics.rst b/Documentation/networking/statistics.rst
index 518284e287b0..66b0ef941457 100644
--- a/Documentation/networking/statistics.rst
+++ b/Documentation/networking/statistics.rst
@@ -184,9 +184,11 @@ Protocol-related statistics can be requested in get commands by setting
the `ETHTOOL_FLAG_STATS` flag in `ETHTOOL_A_HEADER_FLAGS`. Currently
statistics are supported in the following commands:
- - `ETHTOOL_MSG_PAUSE_GET`
- `ETHTOOL_MSG_FEC_GET`
+ - `ETHTOOL_MSG_LINKSTATE_GET`
- `ETHTOOL_MSG_MM_GET`
+ - `ETHTOOL_MSG_PAUSE_GET`
+ - `ETHTOOL_MSG_TSINFO_GET`
debugfs
-------
diff --git a/Documentation/networking/tls.rst b/Documentation/networking/tls.rst
index 36cc7afc2527..980c442d7161 100644
--- a/Documentation/networking/tls.rst
+++ b/Documentation/networking/tls.rst
@@ -280,6 +280,26 @@ If the record decrypted turns out to had been padded or is not a data
record it will be decrypted again into a kernel buffer without zero copy.
Such events are counted in the ``TlsDecryptRetry`` statistic.
+TLS_TX_MAX_PAYLOAD_LEN
+~~~~~~~~~~~~~~~~~~~~~~
+
+Specifies the maximum size of the plaintext payload for transmitted TLS records.
+
+When this option is set, the kernel enforces the specified limit on all outgoing
+TLS records. No plaintext fragment will exceed this size. This option can be used
+to implement the TLS Record Size Limit extension [1].
+
+* For TLS 1.2, the value corresponds directly to the record size limit.
+* For TLS 1.3, the value should be set to record_size_limit - 1, since
+ the record size limit includes one additional byte for the ContentType
+ field.
+
+The valid range for this option is 64 to 16384 bytes for TLS 1.2, and 63 to
+16384 bytes for TLS 1.3. The lower minimum for TLS 1.3 accounts for the
+extra byte used by the ContentType field.
+
+[1] https://datatracker.ietf.org/doc/html/rfc8449
+
Statistics
==========
diff --git a/Documentation/networking/xfrm/index.rst b/Documentation/networking/xfrm/index.rst
new file mode 100644
index 000000000000..7d866da836fe
--- /dev/null
+++ b/Documentation/networking/xfrm/index.rst
@@ -0,0 +1,13 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+XFRM Framework
+==============
+
+.. toctree::
+ :maxdepth: 2
+
+ xfrm_device
+ xfrm_proc
+ xfrm_sync
+ xfrm_sysctl
diff --git a/Documentation/networking/xfrm_device.rst b/Documentation/networking/xfrm/xfrm_device.rst
index 122204da0fff..b0d85a5f57d1 100644
--- a/Documentation/networking/xfrm_device.rst
+++ b/Documentation/networking/xfrm/xfrm_device.rst
@@ -20,11 +20,15 @@ can radically increase throughput and decrease CPU utilization. The XFRM
Device interface allows NIC drivers to offer to the stack access to the
hardware offload.
-Right now, there are two types of hardware offload that kernel supports.
+Right now, there are two types of hardware offload that kernel supports:
+
* IPsec crypto offload:
+
* NIC performs encrypt/decrypt
* Kernel does everything else
+
* IPsec packet offload:
+
* NIC performs encrypt/decrypt
* NIC does encapsulation
* Kernel and NIC have SA and policy in-sync
@@ -34,7 +38,7 @@ Right now, there are two types of hardware offload that kernel supports.
Userland access to the offload is typically through a system such as
libreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can
be handy when experimenting. An example command might look something
-like this for crypto offload:
+like this for crypto offload::
ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
reqid 0x07 replay-window 32 \
@@ -42,7 +46,7 @@ like this for crypto offload:
sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
offload dev eth4 dir in
-and for packet offload
+and for packet offload::
ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
reqid 0x07 replay-window 32 \
@@ -153,26 +157,26 @@ the packet's skb. At this point the data should be decrypted but the
IPsec headers are still in the packet data; they are removed later up
the stack in xfrm_input().
- find and hold the SA that was used to the Rx skb::
+1. Find and hold the SA that was used to the Rx skb::
- get spi, protocol, and destination IP from packet headers
+ /* get spi, protocol, and destination IP from packet headers */
xs = find xs from (spi, protocol, dest_IP)
xfrm_state_hold(xs);
- store the state information into the skb::
+2. Store the state information into the skb::
sp = secpath_set(skb);
if (!sp) return;
sp->xvec[sp->len++] = xs;
sp->olen++;
- indicate the success and/or error status of the offload::
+3. Indicate the success and/or error status of the offload::
xo = xfrm_offload(skb);
xo->flags = CRYPTO_DONE;
xo->status = crypto_status;
- hand the packet to napi_gro_receive() as usual
+4. Hand the packet to napi_gro_receive() as usual.
In ESN mode, xdo_dev_state_advance_esn() is called from
xfrm_replay_advance_esn() for RX, and xfrm_replay_overflow_offload_esn for TX.
diff --git a/Documentation/networking/xfrm_proc.rst b/Documentation/networking/xfrm/xfrm_proc.rst
index 973d1571acac..973d1571acac 100644
--- a/Documentation/networking/xfrm_proc.rst
+++ b/Documentation/networking/xfrm/xfrm_proc.rst
diff --git a/Documentation/networking/xfrm_sync.rst b/Documentation/networking/xfrm/xfrm_sync.rst
index 6246503ceab2..dfc2ec0df380 100644
--- a/Documentation/networking/xfrm_sync.rst
+++ b/Documentation/networking/xfrm/xfrm_sync.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0
-====
-XFRM
-====
+=========
+XFRM sync
+=========
The sync patches work is based on initial patches from
Krisztian <hidden@balabit.hu> and others and additional patches
@@ -36,7 +36,7 @@ is not driven by packet arrival.
- the replay sequence for both inbound and outbound
1) Message Structure
-----------------------
+--------------------
nlmsghdr:aevent_id:optional-TLVs.
@@ -83,31 +83,31 @@ when going from kernel to user space)
A program needs to subscribe to multicast group XFRMNLGRP_AEVENTS
to get notified of these events.
-2) TLVS reflect the different parameters:
------------------------------------------
+2) TLVS reflect the different parameters
+----------------------------------------
a) byte value (XFRMA_LTIME_VAL)
-This TLV carries the running/current counter for byte lifetime since
-last event.
+ This TLV carries the running/current counter for byte lifetime since
+ last event.
-b)replay value (XFRMA_REPLAY_VAL)
+b) replay value (XFRMA_REPLAY_VAL)
-This TLV carries the running/current counter for replay sequence since
-last event.
+ This TLV carries the running/current counter for replay sequence since
+ last event.
-c)replay threshold (XFRMA_REPLAY_THRESH)
+c) replay threshold (XFRMA_REPLAY_THRESH)
-This TLV carries the threshold being used by the kernel to trigger events
-when the replay sequence is exceeded.
+ This TLV carries the threshold being used by the kernel to trigger events
+ when the replay sequence is exceeded.
d) expiry timer (XFRMA_ETIMER_THRESH)
-This is a timer value in milliseconds which is used as the nagle
-value to rate limit the events.
+ This is a timer value in milliseconds which is used as the nagle
+ value to rate limit the events.
-3) Default configurations for the parameters:
----------------------------------------------
+3) Default configurations for the parameters
+--------------------------------------------
By default these events should be turned off unless there is
at least one listener registered to listen to the multicast
@@ -121,12 +121,14 @@ in case they are not specified.
the two sysctls/proc entries are:
a) /proc/sys/net/core/sysctl_xfrm_aevent_etime
-used to provide default values for the XFRMA_ETIMER_THRESH in incremental
-units of time of 100ms. The default is 10 (1 second)
+
+ Used to provide default values for the XFRMA_ETIMER_THRESH in incremental
+ units of time of 100ms. The default is 10 (1 second)
b) /proc/sys/net/core/sysctl_xfrm_aevent_rseqth
-used to provide default values for XFRMA_REPLAY_THRESH parameter
-in incremental packet count. The default is two packets.
+
+ Used to provide default values for XFRMA_REPLAY_THRESH parameter
+ in incremental packet count. The default is two packets.
4) Message types
----------------
@@ -134,50 +136,51 @@ in incremental packet count. The default is two packets.
a) XFRM_MSG_GETAE issued by user-->kernel.
XFRM_MSG_GETAE does not carry any TLVs.
-The response is a XFRM_MSG_NEWAE which is formatted based on what
-XFRM_MSG_GETAE queried for.
+ The response is a XFRM_MSG_NEWAE which is formatted based on what
+ XFRM_MSG_GETAE queried for.
+
+ The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
-The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
-* if XFRM_AE_RTHR flag is set, then XFRMA_REPLAY_THRESH is also retrieved
-* if XFRM_AE_ETHR flag is set, then XFRMA_ETIMER_THRESH is also retrieved
+ * if XFRM_AE_RTHR flag is set, then XFRMA_REPLAY_THRESH is also retrieved
+ * if XFRM_AE_ETHR flag is set, then XFRMA_ETIMER_THRESH is also retrieved
b) XFRM_MSG_NEWAE is issued by either user space to configure
or kernel to announce events or respond to a XFRM_MSG_GETAE.
-i) user --> kernel to configure a specific SA.
+ i) user --> kernel to configure a specific SA.
-any of the values or threshold parameters can be updated by passing the
-appropriate TLV.
+ any of the values or threshold parameters can be updated by passing the
+ appropriate TLV.
-A response is issued back to the sender in user space to indicate success
-or failure.
+ A response is issued back to the sender in user space to indicate success
+ or failure.
-In the case of success, additionally an event with
-XFRM_MSG_NEWAE is also issued to any listeners as described in iii).
+ In the case of success, additionally an event with
+ XFRM_MSG_NEWAE is also issued to any listeners as described in iii).
-ii) kernel->user direction as a response to XFRM_MSG_GETAE
+ ii) kernel->user direction as a response to XFRM_MSG_GETAE
-The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
+ The response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
-The threshold TLVs will be included if explicitly requested in
-the XFRM_MSG_GETAE message.
+ The threshold TLVs will be included if explicitly requested in
+ the XFRM_MSG_GETAE message.
-iii) kernel->user to report as event if someone sets any values or
- thresholds for an SA using XFRM_MSG_NEWAE (as described in #i above).
- In such a case XFRM_AE_CU flag is set to inform the user that
- the change happened as a result of an update.
- The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
+ iii) kernel->user to report as event if someone sets any values or
+ thresholds for an SA using XFRM_MSG_NEWAE (as described in #i above).
+ In such a case XFRM_AE_CU flag is set to inform the user that
+ the change happened as a result of an update.
+ The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
-iv) kernel->user to report event when replay threshold or a timeout
- is exceeded.
+ iv) kernel->user to report event when replay threshold or a timeout
+ is exceeded.
In such a case either XFRM_AE_CR (replay exceeded) or XFRM_AE_CE (timeout
happened) is set to inform the user what happened.
Note the two flags are mutually exclusive.
The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
-Exceptions to threshold settings
---------------------------------
+5) Exceptions to threshold settings
+-----------------------------------
If you have an SA that is getting hit by traffic in bursts such that
there is a period where the timer threshold expires with no packets
diff --git a/Documentation/networking/xfrm_sysctl.rst b/Documentation/networking/xfrm/xfrm_sysctl.rst
index 47b9bbdd0179..7d0c4b17c0bd 100644
--- a/Documentation/networking/xfrm_sysctl.rst
+++ b/Documentation/networking/xfrm/xfrm_sysctl.rst
@@ -4,8 +4,8 @@
XFRM Syscall
============
-/proc/sys/net/core/xfrm_* Variables:
-====================================
+/proc/sys/net/core/xfrm_* Variables
+===================================
xfrm_acq_expires - INTEGER
default 30 - hard timeout in seconds for acquire requests