| Age | Commit message (Collapse) | Author |
|
This ethtool counter is meant to help with observing how many times the
congestion event was triggered but on query there was no state change.
This would help to indicate when a work item was scheduled to run too
late and in the meantime the congestion state changed back to previous
state.
While at it, do a driveby typo fix in documentation for
pci_bw_inbound_high.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1757237976-531416-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add devlink driverinit parameters for configuring the thresholds for
PCIe congestion events. These parameters are registered only when the
firmware supports this feature.
Update the mlx5 devlink docs as well on these new params.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1757237976-531416-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some devices support both symmetric (same value for all PFs) and
asymmetric, while others only support symmetric configuration. This
implementation prefers asymmetric, since it is closer to the devlink
model (per function settings), but falls back to symmetric when needed.
Example usage:
devlink dev param set pci/0000:01:00.0 name total_vfs value <u16> cmode permanent
devlink dev reload pci/0000:01:00.0 action fw_activate
echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove
echo 1 >/sys/bus/pci/rescan
cat /sys/bus/pci/devices/0000:01:00.0/sriov_totalvfs
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Tested-by: Kamal Heib <kheib@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250907012953.301746-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Example usage:
devlink dev param set pci/0000:01:00.0 name enable_sriov value {true, false} cmode permanent
devlink dev reload pci/0000:01:00.0 action fw_activate
echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove
echo 1 >/sys/bus/pci/rescan
grep ^ /sys/bus/pci/devices/0000:01:00.0/sriov_*
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Tested-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250907012953.301746-4-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Selects which algorithm should be used by the NIC in order to decide rate of
CQE compression dependeng on PCIe bus conditions.
Supported values:
1) balanced, merges fewer CQEs, resulting in a moderate compression ratio
but maintaining a balance between bandwidth savings and performance
2) aggressive, merges more CQEs into a single entry, achieving a higher
compression rate and maximizing performance, particularly under high
traffic loads.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250907012953.301746-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
NICs are typically configured with total_vfs=0, forcing users to rely
on external tools to enable SR-IOV (a widely used and essential feature).
Add total_vfs parameter to devlink for SR-IOV max VF configurability.
Enables standard kernel tools to manage SR-IOV, addressing the need for
flexible VF configuration.
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Tested-by: Kamal Heib <kheib@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250907012953.301746-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently the ADD_ADDR option is retransmitted with a fixed timeout. This
patch makes the retransmission timeout adaptive by using the maximum RTO
among all the subflows, while still capping it at the configured maximum
value (add_addr_timeout_max). This improves responsiveness when
establishing new subflows.
Specifically:
1. Adds mptcp_adjust_add_addr_timeout() helper to compute the adaptive
timeout.
2. Uses maximum subflow RTO (icsk_rto) when available.
3. Applies exponential backoff based on retransmission count.
4. Maintains fallback to configured max timeout when no RTO data exists.
This slightly changes the behaviour of the MPTCP "add_addr_timeout"
sysctl knob to be used as a maximum instead of a fixed value. But this
is seen as an improvement: the ADD_ADDR might be sent quicker than
before to improve the overall MPTCP connection. Also, the default
value is set to 2 min, which was already way too long, and caused the
ADD_ADDR not to be retransmitted for connections shorter than 2 minutes.
Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/576
Reviewed-by: Christoph Paasch <cpaasch@openai.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250907-net-next-mptcp-add_addr-retrans-adapt-v1-1-824cc805772b@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The net.mptcp.pm_type sysctl knob has been deprecated in v6.15,
net.mptcp.path_manager should be used instead.
Adapt the section about path managers to suggest using the new sysctl
knob instead of the deprecated one.
Fixes: 595c26d122d1 ("mptcp: sysctl: set path manager by name")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-2-5f2168a66079@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
array
The documentation of the 'bcm_msg_head' struct does not match how
it is defined in 'bcm.h'. Changed the frames member to a flexible array,
matching the definition in the header file.
See commit 94dfc73e7cf4 ("treewide: uapi: Replace zero-length arrays with
flexible-array members")
Signed-off-by: Alex Tran <alex.t.tran@gmail.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20250904031709.1426895-1-alex.t.tran@gmail.com
Fixes: 94dfc73e7cf4 ("treewide: uapi: Replace zero-length arrays with flexible-array members")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217783
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Add a new ad_select policy 'port_priority' that uses the per-port
actor priority values (set via ad_actor_port_prio) to determine
aggregator selection.
This allows administrators to influence which ports are preferred
for aggregation by assigning different priority values, providing
more flexible load balancing control in LACP configurations.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250902064501.360822-3-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Introduce a new netlink attribute 'actor_port_prio' to allow setting
the LACP actor port priority on a per-slave basis. This extends the
existing bonding infrastructure to support more granular control over
LACP negotiations.
The priority value is embedded in LACPDU packets and will be used by
subsequent patches to influence aggregator selection policies.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250902064501.360822-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.17-rc5).
No conflicts.
Adjacent changes:
include/net/sock.h
c51613fa276f ("net: add sk->sk_drop_counters")
5d6b58c932ec ("net: lockless sock_i_ino()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove the implementation of use_carrier, the link monitoring
method that utilizes ethtool or ioctl to determine the link state of an
interface in a bond. Bonding will always behaves as if use_carrier=1,
which relies on netif_carrier_ok() to determine the link state of
interfaces.
To avoid acquiring RTNL many times per second, bonding inspects
link state under RCU, but not under RTNL. However, ethtool
implementations in drivers may sleep, and therefore this strategy is
unsuitable for use with calls into driver ethtool functions.
The use_carrier option was introduced in 2003, to provide
backwards compatibility for network device drivers that did not support
the then-new netif_carrier_ok/on/off system. Device drivers are now
expected to support netif_carrier_*, and the use_carrier backwards
compatibility logic is no longer necessary.
The option itself remains, but when queried always returns 1,
and may only be set to 1.
Link: https://lore.kernel.org/000000000000eb54bf061cfd666a@google.com
Link: https://lore.kernel.org/20240718122017.d2e33aaac43a.I10ab9c9ded97163aef4e4de10985cd8f7de60d28@changeid
Signed-off-by: Jay Vosburgh <jv@jvosburgh.net>
Reported-by: syzbot+b8c48ea38ca27d150063@syzkaller.appspotmail.com
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/2029487.1756512517@famine
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 2677010e7793 ("Add support to set NAPI threaded for individual
NAPI") introduced threaded NAPI configuration per individual NAPI
instance, however obsolete description that threaded NAPI is per device
has remained.
Remove the old description and clarify that only NAPI instances running
in threaded mode spawn kernel threads by changing "Each NAPI instance"
to "Each threaded NAPI instance".
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Link: https://patch.msgid.link/20250829064857.51503-1-enjuk@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
zcrx currently requires the ring to be set up with fixed 32b CQEs,
allow it to use IORING_SETUP_CQE_MIXED as well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Enable configuration of the burst period — a time window starting
from the first error recovery, during which the reporter allows
recovery attempts for each reported error.
This feature is helpful when a single underlying issue causes multiple
errors, as it delays the start of the grace period to allow sufficient
time for recovering all related errors. For example, if multiple TX
queues time out simultaneously, a sufficient burst period could allow
all affected TX queues to be recovered within that window. Without this
period, only the first TX queue that reports a timeout will undergo
recovery, while the remaining TX queues will be blocked once the grace
period begins.
Configuration example:
$ devlink health set pci/0000:00:09.0 reporter tx burst_period 500
Configuration example with ynl:
./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/devlink.yaml \
--do health-reporter-set --json '{
"bus-name": "auxiliary",
"dev-name": "mlx5_core.eth.0",
"port-index": 65535,
"health-reporter-name": "tx",
"health-reporter-burst-period": 500
}'
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250824084354.533182-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.17-rc3).
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add description and high-level diagram for PPE, driver overview and
module enable/debug information.
Signed-off-by: Lei Wei <quic_leiwei@quicinc.com>
Signed-off-by: Luo Jie <quic_luoj@quicinc.com>
Link: https://patch.msgid.link/20250818-qcom_ipq_ppe-v8-2-1d4ff641fce9@quicinc.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
SO_RCVBUF and SO_SNDBUF have limited range today, unless
distros or system admins change rmem_max and wmem_max.
Even iproute2 uses 1 MB SO_RCVBUF which is capped by
the kernel.
Decouple [rw]mem_max and [rw]mem_default and increase
[rw]mem_max to 4 MB.
Before:
$ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.core.wmem_default = 212992
net.core.wmem_max = 212992
After:
$ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max
net.core.rmem_default = 212992
net.core.rmem_max = 4194304
net.core.wmem_default = 212992
net.core.wmem_max = 4194304
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250819174030.1986278-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The upgrade of the cookie authentication algorithm to HMAC-SHA256 kept
some backwards compatibility for the net.sctp.cookie_hmac_alg sysctl by
still accepting the values 'md5' and 'sha1'. Those algorithms are no
longer actually used, but rather those values were just treated as
requests to enable cookie authentication.
As requested at
https://lore.kernel.org/netdev/CADvbK_fmCRARc8VznH8cQa-QKaCOQZ6yFbF=1-VDK=zRqv_cXw@mail.gmail.com/
and https://lore.kernel.org/netdev/20250818084345.708ac796@kernel.org/ ,
go further and start rejecting 'md5' and 'sha1' completely.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20250818205426.30222-6-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Convert SCTP cookies to use HMAC-SHA256, instead of the previous choice
of the legacy algorithms HMAC-MD5 and HMAC-SHA1. Simplify and optimize
the code by using the HMAC-SHA256 library instead of crypto_shash, and
by preparing the HMAC key when it is generated instead of per-operation.
This doesn't break compatibility, since the cookie format is an
implementation detail, not part of the SCTP protocol itself.
Note that the cookie size doesn't change either. The HMAC field was
already 32 bytes, even though previously at most 20 bytes were actually
compared. 32 bytes exactly fits an untruncated HMAC-SHA256 value. So,
although we could safely truncate the MAC to something slightly shorter,
for now just keep the cookie size the same.
I also considered SipHash, but that would generate only 8-byte MACs. An
8-byte MAC *might* suffice here. However, there's quite a lot of
information in the SCTP cookies: more than in TCP SYN cookies. So
absent an analysis that occasional forgeries of all that information is
okay in SCTP, I errored on the side of caution.
Remove HMAC-MD5 and HMAC-SHA1 as options, since the new HMAC-SHA256
option is just better. It's faster as well as more secure. For
example, benchmarking on x86_64, cookie authentication is now nearly 3x
as fast as the previous default choice and implementation of HMAC-MD5.
Also just make the kernel always support cookie authentication if SCTP
is supported at all, rather than making it optional in the build. (It
was sort of optional before, but it didn't really work properly. E.g.,
a kernel with CONFIG_SCTP_COOKIE_HMAC_MD5=n still supported HMAC-MD5
cookie authentication if CONFIG_CRYPTO_HMAC and CONFIG_CRYPTO_MD5
happened to be enabled in the kconfig for other reasons.)
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20250818205426.30222-5-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for XDP statistics collection and reporting via rtnl_link
and netdev_queue API.
For XDP programs without frags support, fbnic requires MTU to be less
than the HDS threshold. If an over-sized frame is received, the frame
is dropped and recorded as rx_length_errors reported via ip stats to
highlight that this is an error.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
Link: https://patch.msgid.link/20250813221319.3367670-9-mohsin.bashr@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When add_addr_timeout was set to 0, this caused the ADD_ADDR to be
retransmitted immediately, which looks like a buggy behaviour. Instead,
interpret 0 as "no retransmissions needed".
The documentation is updated to explicitly state that setting the timeout
to 0 disables retransmission.
Fixes: 93f323b9cccc ("mptcp: add a new sysctl add_addr_timeout")
Cc: stable@vger.kernel.org
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250815-net-mptcp-misc-fixes-6-17-rc2-v1-5-521fe9957892@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A big set of typo fixes from Bjorn Helgaas
|
|
Fix typos.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250813200526.290420-7-helgaas@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-docs
Mauro Carvalho Chehab says:
====================
add a generic yaml parser integrated with Netlink specs generation
- An YAML parser Sphinx plugin, integrated with Netlink YAML doc
parser.
The patch content is identical to my v10 submission:
https://lore.kernel.org/cover.1753718185.git.mchehab+huawei@kernel.org
* tag 'docs/v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-docs:
sphinx: parser_yaml.py: fix line numbers information
docs: parser_yaml.py: fix backward compatibility with old docutils
docs: parser_yaml.py: add support for line numbers from the parser
tools: netlink_yml_parser.py: add line numbers to parsed data
MAINTAINERS: add netlink_yml_parser.py to linux-doc
docs: netlink: remove obsolete .gitignore from unused directory
tools: ynl_gen_rst.py: drop support for generating index files
docs: uapi: netlink: update netlink specs link
docs: use parser_yaml extension to handle Netlink specs
docs: sphinx: add a parser for yaml files for Netlink specs
tools: ynl_gen_rst.py: cleanup coding style
docs: netlink: index.rst: add a netlink index file
tools: ynl_gen_rst.py: Split library from command line tool
docs: netlink: netlink-raw.rst: use :ref: instead of :doc:
====================
Link: https://patch.msgid.link/20250812113329.356c93c2@foz.lan
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
udp_child_ehash_entries -> udp_child_hash_entries
Fixes: 9804985bf27f ("udp: Introduce optional per-netns hash table.")
Signed-off-by: Jordan Rife <jordan@jrife.io>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250808185800.1189042-1-jordan@jrife.io
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The previous code was generating source rst files
under Documentation/networking/netlink_spec/. With the
Sphinx YAML parser, this is now gone. So, stop ignoring
*.rst files inside netlink specs directory.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
|
|
Instead of manually calling ynl_gen_rst.py, use a Sphinx extension.
This way, no .rst files would be written to the Kernel source
directories.
We are using here a toctree with :glob: property. This way, there
is no need to touch the netlink/specs/index.rst file every time
a new Netlink spec is added/renamed/removed.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
|
|
Pull documentation updates from Jonathan Corbet:
"It has been a relatively busy cycle for docs, especially the build
system:
- The Perl kernel-doc script was added to 2.3.52pre1 just after the
turn of the millennium. Over the following 25 years, it accumulated
a vast amount of cruft, all in a language few people want to deal
with anymore. Mauro's Python replacement in 6.16 faithfully
reproduced all of the cruft in the hope of avoiding regressions.
Now that we have a more reasonable code base, though, we can work
on cleaning it up; many of the changes this time around are toward
that end.
- A reorganization of the ext4 docs into the usual TOC format.
- Various Chinese translations and updates.
- A new script from Mauro to help with docs-build testing.
- A new document for linked lists
- A sweep through MAINTAINERS fixing broken GitHub git:// repository
links.
...and lots of fixes and updates"
* tag 'docs-6.17' of git://git.lwn.net/linux: (147 commits)
scripts: add origin commit identification based on specific patterns
sphinx: kernel_abi: fix performance regression with O=<dir>
Documentation: core-api: entry: Replace deprecated KVM entry/exit functions
docs: fault-injection: drop reference to md-faulty
docs: document linked lists
scripts: kdoc: make it backward-compatible with Python 3.7
docs: kernel-doc: emit warnings for ancient versions of Python
Documentation/rtla: Describe exit status
Documentation/rtla: Add include common_appendix.rst
docs: kernel: Clarify printk_ratelimit_burst reset behavior
Documentation: ioctl-number: Don't repeat macro names
Documentation: ioctl-number: Shorten macros table
Documentation: ioctl-number: Correct full path to papr-physical-attestation.h
Documentation: ioctl-number: Extend "Include File" column width
Documentation: ioctl-number: Fix linuxppc-dev mailto link
overlayfs.rst: fix typos
docs: kdoc: emit a warning for ancient versions of Python
docs: kdoc: clean up check_sections()
docs: kdoc: directly access the always-there KdocItem fields
docs: kdoc: straighten up dump_declaration()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2025-07-25
The first patch is by Khaled Elnaggar and converts the janz-ican3
driver's fwinfo_show() to sysfs_emit().
Vincent Mailhol contributes 3 patches that first fix a warning in the
ti_hecc driver and then add missing COMPILE_TEST more compile
coverage to the ti_hecc and tscan1 driver.
Randy Dunlap's patch let's the tscan1 driver depend on PC104.
A patch by Luis Felipe Hernandez fixes a kernel-doc error in the
ctucanfd driver.
Jimmy Assarsson contributes 10 patches for the kvaser_pciefd and 11
for the kvaser_usb driver. Both series simplify the identification of
physical the CAN interfaces and add devlink support to get information
about the running firmware.
* tag 'linux-can-next-for-6.17-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (27 commits)
Documentation: devlink: add devlink documentation for the kvaser_usb driver
can: kvaser_usb: Add devlink port support
can: kvaser_usb: Expose device information via devlink info_get()
can: kvaser_usb: Add devlink support
can: kvaser_usb: Store additional device information
can: kvaser_usb: Store the different firmware version components in a struct
can: kvaser_usb: Move comment regarding max_tx_urbs
can: kvaser_usb: Add intermediate variables
can: kvaser_usb: Assign netdev.dev_port based on device channel index
can: kvaser_usb: Add support for ethtool set_phys_id()
can: kvaser_usb: Add support to control CAN LEDs on device
Documentation: devlink: add devlink documentation for the kvaser_pciefd driver
can: kvaser_pciefd: Add devlink port support
can: kvaser_pciefd: Expose device firmware version via devlink info_get()
can: kvaser_pciefd: Add devlink support
can: kvaser_pciefd: Split driver into C-file and header-file.
can: kvaser_pciefd: Store device channel index
can: kvaser_pciefd: Store the different firmware version components in a struct
can: kvaser_pciefd: Add intermediate variable for device struct in probe()
can: kvaser_pciefd: Add support for ethtool set_phys_id()
...
====================
Link: https://patch.msgid.link/20250725161327.4165174-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It is currently impossible to enable ipv6 forwarding on a per-interface
basis like in ipv4. To enable forwarding on an ipv6 interface we need to
enable it on all interfaces and disable it on the other interfaces using
a netfilter rule. This is especially cumbersome if you have lots of
interfaces and only want to enable forwarding on a few. According to the
sysctl docs [0] the `net.ipv6.conf.all.forwarding` enables forwarding
for all interfaces, while the interface-specific
`net.ipv6.conf.<interface>.forwarding` configures the interface
Host/Router configuration.
Introduce a new sysctl flag `force_forwarding`, which can be set on every
interface. The ip6_forwarding function will then check if the global
forwarding flag OR the force_forwarding flag is active and forward the
packet.
To preserve backwards-compatibility reset the flag (on all interfaces)
to 0 if the net.ipv6.conf.all.forwarding flag is set to 0.
Add a short selftest that checks if a packet gets forwarded with and
without `force_forwarding`.
[0]: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
Link: https://patch.msgid.link/20250722081847.132632-1-g.goller@proxmox.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
List the version information reported by the kvaser_usb driver
through devlink.
Suggested-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://patch.msgid.link/20250725123452.41-12-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
List the version information reported by the kvaser_pciefd driver
through devlink.
Suggested-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Jimmy Assarsson <extja@kvaser.com>
Link: https://patch.msgid.link/20250725123230.8-11-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Instead of using '0' and '1' for napi threaded state use an enum with
'disabled' and 'enabled' states.
Tested:
./tools/testing/selftests/net/nl_netdev.py
TAP version 13
1..7
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
ok 3 nl_netdev.page_pool_check
ok 4 nl_netdev.napi_list_check
ok 5 nl_netdev.dev_set_threaded
ok 6 nl_netdev.napi_set_threaded
ok 7 nl_netdev.nsim_rxq_reset_down
# Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Link: https://patch.msgid.link/20250723013031.2911384-4-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement removing additional RSS contexts via Netlink.
Technically it'd be possible to shoehorn the delete operation
into ethnl_request_ops-compatible handler. The code ends
up longer than open coded version, and I think we'll need
a custom way of sending notifications at some stage (if we
allow tying the context lifetime to the netlink socket, in
the future).
Link: https://patch.msgid.link/20250717234343.2328602-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Support creating contexts via Netlink. Setting flow hashing
fields on the new context is not supported at this stage,
it can be added later.
An empty indirection table is not supported. This is a carry
over from the IOCTL interface where empty indirection table
meant delete. We can repurpose empty indirection table in
Netlink but for now to avoid confusion reject it using the
policy.
Support letting user choose the ID for the new context. This was
not possible in IOCTL since the context ID field for the create
action had to be set to the ETH_RXFH_CONTEXT_ALLOC magic value.
Link: https://patch.msgid.link/20250717234343.2328602-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.
Note that one dev_set_threaded call still remains in mt76 for debugfs file.
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-7-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-07-17
We've added 13 non-merge commits during the last 20 day(s) which contain
a total of 4 files changed, 712 insertions(+), 84 deletions(-).
The main changes are:
1) Avoid skipping or repeating a sk when using a TCP bpf_iter,
from Jordan Rife.
2) Clarify the driver requirement on using the XDP metadata,
from Song Yoong Siang
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
doc: xdp: Clarify driver implementation for XDP Rx metadata
selftests/bpf: Add tests for bucket resume logic in established sockets
selftests/bpf: Create iter_tcp_destroy test program
selftests/bpf: Create established sockets in socket iterator tests
selftests/bpf: Make ehash buckets configurable in socket iterator tests
selftests/bpf: Allow for iteration over multiple states
selftests/bpf: Allow for iteration over multiple ports
selftests/bpf: Add tests for bucket resume logic in listening sockets
bpf: tcp: Avoid socket skips and repeats during iteration
bpf: tcp: Use bpf_tcp_iter_batch_item for bpf_tcp_iter_state batch items
bpf: tcp: Get rid of st_bucket_done
bpf: tcp: Make sure iter->batch always contains a full bucket snapshot
bpf: tcp: Make mem flags configurable through bpf_iter_tcp_realloc_batch
====================
Link: https://patch.msgid.link/20250717191731.4142326-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for ETHTOOL_SRXFH (setting hashing fields) in RSS_SET.
The tricky part is dealing with symmetric hashing. In netlink user
can change the hashing fields and symmetric hash in one request,
in IOCTL the two used to be set via different uAPI requests.
Since fields and hash function config are still separate driver
callbacks - changes to the two are not atomic. Keep things simple
and validate the settings against both pre- and post- change ones.
Meaning that we will reject the config request if user tries
to correct the flow fields and set input_xfrm in one request,
or disables input_xfrm and makes flow fields non-symmetric.
We can adjust it later if there's a real need. Starting simple feels
right, and potentially partially applying the settings isn't nice,
either.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-11-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Support configuring symmetric hashing via Netlink.
We have the flow field config prepared as part of SET handling,
so scan it for conflicts instead of querying the driver again.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-10-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Support setting RSS hashing key via ethtool Netlink.
Use the Netlink policy to make sure user doesn't pass
an empty key, "resetting" the key is not a thing.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Support setting RSS hash function / algo via ethtool Netlink.
Like IOCTL we don't validate that the function is within the
range known to the kernel. The drivers do a pretty good job
validating the inputs, and the IDs are technically "dynamically
queried" rather than part of uAPI.
Only change should be that in Netlink we don't support user
explicitly passing ETH_RSS_HASH_NO_CHANGE (0), if no change
is requested the attribute should be absent.
The ETH_RSS_HASH_NO_CHANGE is retained in driver-facing
API for consistency (not that I see a strong reason for it).
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add initial support for RSS_SET, for now only operations on
the indirection table are supported.
Unlike the ioctl don't check if at least one parameter is
being changed. This is how other ethtool-nl ops behave,
so pick the ethtool-nl consistency vs copying ioctl behavior.
There are two special cases here:
1) resetting the table to defaults;
2) support for tables of different size.
For (1) I use an empty Netlink attribute (array of size 0).
(2) may require some background. AFAICT a lot of modern devices
allow allocating RSS tables of different sizes. mlx5 can upsize
its tables, bnxt has some "table size calculation", and Intel
folks asked about RSS table sizing in context of resource allocation
in the past. The ethtool IOCTL API has a concept of table size,
but right now the user is expected to provide a table exactly
the size the device requests. Some drivers may change the table
size at runtime (in response to queue count changes) but the
user is not in control of this. What's not great is that all
RSS contexts share the same table size. For example a device
with 128 queues enabled, 16 RSS contexts 8 queues in each will
likely have 256 entry tables for each of the 16 contexts,
while 32 would be more than enough given each context only has
8 queues. To address this the Netlink API should avoid enforcing
table size at the uAPI level, and should allow the user to express
the min table size they expect.
To fully solve (2) we will need more driver plumbing but
at the uAPI level this patch allows the user to specify
a table size smaller than what the device advertises. The device
table size must be a multiple of the user requested table size.
We then replicate the user-provided table to fill the full device
size table. This addresses the "allow the user to express the min
table size" objective, while not enforcing any fixed size.
From Netlink perspective .get_rxfh_indir_size() is now de facto
the "max" table size supported by the device.
We may choose to support table replication in ethtool, too,
when we actually plumb this thru the device APIs.
Initially I was considering moving full pattern generation
to the kernel (which queues to use, at which frequency and
what min sequence length). I don't think this complexity
would buy us much and most if not all devices have pow-2
table sizes, which simplifies the replication a lot.
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250716000331.1378807-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement the PCIe Congestion Event notifier which triggers a work item
to query the PCIe Congestion Event object. The result of the congestion
state is reflected in the new ethtool stats:
* pci_bw_inbound_high: the device has crossed the high threshold for
inbound PCIe traffic.
* pci_bw_inbound_low: the device has crossed the low threshold for
inbound PCIe traffic
* pci_bw_outbound_high: the device has crossed the high threshold for
outbound PCIe traffic.
* pci_bw_outbound_low: the device has crossed the low threshold for
outbound PCIe traffic
The high and low thresholds are currently configured at 90% and 75%.
These are hysteresis thresholds which help to check if the
PCI bus on the device side is in a congested state.
If low + 1 = high then the device is in a congested state. If low == high
then the device is not in a congested state.
The counters are also documented.
A follow-up patch will make the thresholds configurable.
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1752589821-145787-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Clarify that drivers must remove device-reserved metadata from the
data_meta area before passing frames to XDP programs.
Additionally, expand the explanation of how userspace and BPF programs
should coordinate the use of METADATA_SIZE, and add a detailed diagram
to illustrate pointer adjustments and metadata layout.
Also describe the requirements and constraints enforced by
bpf_xdp_adjust_meta().
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20250716154846.3513575-1-yoong.siang.song@intel.com
|
|
Add a new SNMP MIB : LINUX_MIB_BEYOND_WINDOW
Incremented when an incoming packet is received beyond the
receiver window.
nstat -az | grep TcpExtBeyondWindow
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250711114006.480026-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A net device has a threaded sysctl that can be used to enable threaded
NAPI polling on all of the NAPI contexts under that device. Allow
enabling threaded NAPI polling at individual NAPI level using netlink.
Extend the netlink operation `napi-set` and allow setting the threaded
attribute of a NAPI. This will enable the threaded polling on a NAPI
context.
Add a test in `nl_netdev.py` that verifies various cases of threaded
NAPI being set at NAPI and at device level.
Tested
./tools/testing/selftests/net/nl_netdev.py
TAP version 13
1..7
ok 1 nl_netdev.empty_check
ok 2 nl_netdev.lo_check
ok 3 nl_netdev.page_pool_check
ok 4 nl_netdev.napi_list_check
ok 5 nl_netdev.dev_set_threaded
ok 6 nl_netdev.napi_set_threaded
ok 7 nl_netdev.nsim_rxq_reset_down
# Totals: pass:7 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250710211203.3979655-1-skhawaja@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next (v2)
The following series contains an initial small batch of Netfilter
updates for net-next:
1) Remove DCCP conntrack support, keep DCCP matches around in order to
avoid breakage when loading ruleset, add Kconfig to wrap the code
so it can be disabled by distributors.
2) Remove buggy code aiming at shrinking netlink deletion event, then
re-add it correctly in another patch. This is to prevent -stable to
pick up on a fix that breaks old userspace. From Phil Sutter.
3) Missing WARN_ON_ONCE() to check for lockdep_commit_lock_is_held()
to uncover bugs. From Fedor Pchelkin.
* tag 'nf-next-25-07-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: adjust lockdep assertions handling
netfilter: nf_tables: Reintroduce shortened deletion notifications
netfilter: nf_tables: Drop dead code from fill_*_info routines
netfilter: conntrack: remove DCCP protocol support
====================
Link: https://patch.msgid.link/20250710010706.2861281-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement ETHTOOL_GRXFH over Netlink. The number of flow types is
reasonable (around 20) so report all of them at once for simplicity.
Do not maintain the flow ID mapping with ioctl at the uAPI level.
This gives us a chance to clean up the confusion that come from
RxNFC vs RxFH (flow direction vs hashing) in the ioctl.
Try to align with the names used in ethtool CLI, they seem to have
stood the test of time just fine. One annoyance is that we still
call L4 ports the weird names, but I guess they also apply to IPSec
(where they cover the SPI) so it is what it is.
$ ynl --family ethtool --dump rss-get
{
"header": {
"dev-index": 1,
"dev-name": "enp1s0"
},
"hfunc": 1,
"hkey": b"...",
"indir": [0, 1, ...],
"flow-hash": {
"ether": {"l2da"},
"ah-esp4": {"ip-src", "ip-dst"},
"ah-esp6": {"ip-src", "ip-dst"},
"ah4": {"ip-src", "ip-dst"},
"ah6": {"ip-src", "ip-dst"},
"esp4": {"ip-src", "ip-dst"},
"esp6": {"ip-src", "ip-dst"},
"ip4": {"ip-src", "ip-dst"},
"ip6": {"ip-src", "ip-dst"},
"sctp4": {"ip-src", "ip-dst"},
"sctp6": {"ip-src", "ip-dst"},
"udp4": {"ip-src", "ip-dst"},
"udp6": {"ip-src", "ip-dst"}
"tcp4": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"},
"tcp6": {"l4-b-0-1", "l4-b-2-3", "ip-src", "ip-dst"},
},
}
Link: https://patch.msgid.link/20250708220640.2738464-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|