| Age | Commit message (Collapse) | Author |
|
Instead of simply forcing a 0 payload_len in IPv6 header,
implement RFC 2675 and insert a custom extension header.
Note that only TCP stack is currently potentially generating
jumbograms, and that this extension header is purely local,
it wont be sent on a physical link.
This is needed so that packet capture (tcpdump and friends)
can properly dissect these large packets.
Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow the gro_max_size to exceed a value larger than 65536.
There weren't really any external limitations that prevented this other
than the fact that IPv4 only supports a 16 bit length field. Since we have
the option of adding a hop-by-hop header for IPv6 we can allow IPv6 to
exceed this value and for IPv4 and non-TCP flows we can cap things at 65536
via a constant rather than relying on gro_max_size.
[edumazet] limit GRO_MAX_SIZE to (8 * 65535) to avoid overflows.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make sure we will not overflow shinfo->gso_segs
Minimal TCP MSS size is 8 bytes, and shinfo->gso_segs
is a 16bit field.
TCP_MIN_GSO_SIZE is currently defined in include/net/tcp.h,
it seems cleaner to not bring tcp details into include/linux/netdevice.h
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The code for gso_max_size was added originally to allow for debugging and
workaround of buggy devices that couldn't support TSO with blocks 64K in
size. The original reason for limiting it to 64K was because that was the
existing limits of IPv4 and non-jumbogram IPv6 length fields.
With the addition of Big TCP we can remove this limit and allow the value
to potentially go up to UINT_MAX and instead be limited by the tso_max_size
value.
So in order to support this we need to go through and clean up the
remaining users of the gso_max_size value so that the values will cap at
64K for non-TCPv6 flows. In addition we can clean up the GSO_MAX_SIZE value
so that 64K becomes GSO_LEGACY_MAX_SIZE and UINT_MAX will now be the upper
limit for GSO_MAX_SIZE.
v6: (edumazet) fixed a compile error if CONFIG_IPV6=n,
in a new sk_trim_gso_size() helper.
netif_set_tso_max_size() caps the requested TSO size
with GSO_MAX_SIZE.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Very few drivers actually have Kconfig knobs for adding
-DDEBUG. 8 according to a quick grep, while there are
93 users of skb_checksum_none_assert(). Switch to the
new DEBUG_NET_WARN_ON_ONCE() to catch bad skbs.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220511172305.1382810-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No conflicts.
Build issue in drivers/net/ethernet/sfc/ptp.c
54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static")
49e6123c65da ("net: sfc: fix memory leak due to ptp channel")
https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, and bluetooth.
No outstanding fires.
Current release - regressions:
- eth: atlantic: always deep reset on pm op, fix null-deref
Current release - new code bugs:
- rds: use maybe_get_net() when acquiring refcount on TCP sockets
[refinement of a previous fix]
- eth: ocelot: mark traps with a bool instead of guessing type based
on list membership
Previous releases - regressions:
- net: fix skipping features in for_each_netdev_feature()
- phy: micrel: fix null-derefs on suspend/resume and probe
- bcmgenet: check for Wake-on-LAN interrupt probe deferral
Previous releases - always broken:
- ipv4: drop dst in multicast routing path, prevent leaks
- ping: fix address binding wrt vrf
- net: fix wrong network header length when BPF protocol translation
is used on skbs with a fraglist
- bluetooth: fix the creation of hdev->name
- rfkill: uapi: fix RFKILL_IOCTL_MAX_SIZE ioctl request definition
- wifi: iwlwifi: iwl-dbg: use del_timer_sync() before freeing
- wifi: ath11k: reduce the wait time of 11d scan and hw scan while
adding an interface
- mac80211: fix rx reordering with non explicit / psmp ack policy
- mac80211: reset MBSSID parameters upon connection
- nl80211: fix races in nl80211_set_tx_bitrate_mask()
- tls: fix context leak on tls_device_down
- sched: act_pedit: really ensure the skb is writable
- batman-adv: don't skb_split skbuffs with frag_list
- eth: ocelot: fix various issues with TC actions (null-deref; bad
stats; ineffective drops; ineffective filter removal)"
* tag 'net-5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
tls: Fix context leak on tls_device_down
net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()
net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending
net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down()
mlxsw: Avoid warning during ip6gre device removal
net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral
net: ethernet: mediatek: ppe: fix wrong size passed to memset()
Bluetooth: Fix the creation of hdev->name
i40e: i40e_main: fix a missing check on list iterator
net/sched: act_pedit: really ensure the skb is writable
s390/lcs: fix variable dereferenced before check
s390/ctcm: fix potential memory leak
s390/ctcm: fix variable dereferenced before check
net: atlantic: verify hw_head_ lies within TX buffer ring
net: atlantic: add check for MAX_SKB_FRAGS
net: atlantic: reduce scope of is_rsc_complete
net: atlantic: fix "frag[0] not initialized"
net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()
net: phy: micrel: Fix incorrect variable type in micrel
decnet: Use container_of() for struct dn_neigh casts
...
|
|
As we continue to narrow the scope of what the FORTIFY memcpy() will
accept and build alternative APIs that give the compiler appropriate
visibility into more complex memcpy scenarios, there is a need for
"unfortified" memcpy use in rare cases where combinations of compiler
behaviors, source code layout, etc, result in cases where the stricter
memcpy checks need to be bypassed until appropriate solutions can be
developed (i.e. fix compiler bugs, code refactoring, new API, etc). The
intention is for this to be used only if there's no other reasonable
solution, for its use to include a justification that can be used
to assess future solutions, and for it to be temporary.
Example usage included, based on analysis and discussion from:
https://lore.kernel.org/netdev/CANn89iLS_2cshtuXPyNUGDPaic=sJiYfvTb_wNLgWrZRyBxZ_g@mail.gmail.com
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Coco Li <lixiaoyan@google.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220511025301.3636666-1-keescook@chromium.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Make sure skb_transport_header() and skb_transport_offset() uses
are not fooled if the transport header has not been set.
This change will likely expose existing bugs in linux networking stacks.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove from include/linux/netdevice.h helpers
that send debug/info/warnings to syslog.
We plan adding more helpers in following patches.
v2: added two includes, and 'struct net_device' forward declaration
to avoid compile errors (kernel bots)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-05-09
1) Gavin Li, adds exit route from waiting for FW init on device boot and
increases FW init timeout on health recovery flow
2) Support 4 ports HCAs LAG mode
Mark Bloch Says:
================
This series adds to mlx5 drivers support for 4 ports HCAs.
Starting with ConnectX-7 HCAs with 4 ports are possible.
As most driver parts aren't affected by such configuration most driver
code is unchanged.
Specially the only affected areas are:
- Lag
- Devcom
- Merged E-Switch
- Single FDB E-Switch
Lag was chosen to be converted first. Creating hardware LAG when all 4
ports are added to the same bond device.
Devom, merge E-Switch and single FDB E-Switch, are marked as supporting
only 2 ports HCAs and future patches will add support for 4 ports HCAs.
In order to activate the hardware lag a user can execute the:
ip link add bond0 type bond
ip link set bond0 type bond miimon 100 mode 2
ip link set eth2 master bond0
ip link set eth3 master bond0
ip link set eth4 master bond0
ip link set eth5 master bond0
Where eth2, eth3, eth4 and eth5 are the PFs of the same HCA.
================
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Long time ago Tom added a giant comment to skbuff.h explaining
checksums. Now that we have a place in Documentation for skbuff
docs we should render it. Sprinkle some markup while at it.
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The comment about shinfo->dataref split is really unhelpful,
at least to me. Rewrite it and render it to skb documentation.
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add basic skb documentation. It's mostly an intro to the subsequent
patches - it would looks strange if we documented advanced topics
without covering the basics in any way.
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If a physical clock supports a free running cycle counter, then
timestamps shall be based on this time too. For TX it is known in
advance before the transmission if a timestamp based on the free running
cycle counter is needed. For RX it is impossible to know which timestamp
is needed before the packet is received and assigned to a socket.
Support late timestamp determination by a network device. Therefore, an
address/cookie is stored within the new netdev_data field of struct
skb_shared_hwtstamps. This address/cookie is provided to a new network
device function called ndo_get_tstamp(), which returns a timestamp based
on the normal/adjustable time or based on the free running cycle
counter. If function is not supported, then timestamp handling is not
changed.
This mechanism is intended for RX, but TX use is also possible.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ptp_convert_timestamp() converts only the timestamp hwtstamp, which is
a field of the argument with the type struct skb_shared_hwtstamps *. So
a pointer to the hwtstamp field of this structure is sufficient.
Rework ptp_convert_timestamp() to use an argument of type ktime_t *.
This allows to add additional timestamp manipulation stages before the
call of ptp_convert_timestamp().
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The free running cycle counter of physical clocks called cycles shall be
used for hardware timestamps to enable synchronisation.
Introduce new flag SKBTX_HW_TSTAMP_USE_CYCLES, which signals driver to
provide a TX timestamp based on cycles if cycles are supported.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ptp vclocks require a free running time for their timecounter.
Currently only a physical clock forced to free running is supported.
If vclocks are used, then the physical clock cannot be synchronized
anymore. The synchronized time is not available in hardware in this
case. As a result, timed transmission with TAPRIO hardware support
is not possible anymore.
If hardware would support a free running time additionally to the
physical clock, then the physical clock does not need to be forced to
free running. Thus, the physical clocks can still be synchronized
while vclocks are in use.
The physical clock could be used to synchronize the time domain of the
TSN network and trigger TAPRIO. In parallel vclocks can be used to
synchronize other time domains.
Introduce support for a free running cycle counter called cycles to
physical clocks. Rework ptp vclocks to use this free running cycle
counter. Default implementation is based on time of physical clock.
Thus, behavior of ptp vclocks based on physical clocks without free
running cycle counter is identical to previous behavior.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Lag state has become very complicated with many modes, flags, types and
port selections methods and future work will add additional features.
Add a debugfs to query the current lag state. A new directory named "lag"
will be created under the mlx5 debugfs directory. As the driver has
debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag
For example:
/sys/kernel/debug/mlx5/0000:08:00.0/lag
The following files are exposed:
- state: Returns "active" or "disabled". If "active" it means hardware
lag is active.
- members: Returns the BDFs of all the members of lag object.
- type: Returns the type of the lag currently configured. Valid only
if hardware lag is active.
* "roce" - Members are bare metal PFs.
* "switchdev" - Members are in switchdev mode.
* "multipath" - ECMP offloads.
- port_sel_mode: Returns the egress port selection method, valid
only if hardware lag is active.
* "queue_affinity" - Egress port is selected by
the QP/SQ affinity.
* "hash" - Egress port is selected by hash done on
each packet. Controlled by: xmit_hash_policy of the
bond device.
- flags: Returns flags that are specific per lag @type. Valid only if
hardware lag is active.
* "shared_fdb" - "on" or "off", if "on" single FDB is used.
- mapping: Returns the mapping which is used to select egress port.
Valid only if hardware lag is active.
If @port_sel_mode is "hash" returns the active egress ports.
The hash result will select only active ports.
if @port_sel_mode is "queue_affinity" returns the mapping
between the configured port affinity of the QP/SQ and actual
egress port. For example:
* 1:1 - Mapping means if the configured affinity is port 1
traffic will egress via port 1.
* 1:2 - Mapping means if the configured affinity is port 1
traffic will egress via port 2. This can happen
if port 1 is down or in active/backup mode and port 1
is backup.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Increase the define MLX5_MAX_PORTS to 4 as the driver is ready
to support NICs with 4 ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Downstream patches will add support for hardware lag with
more than 2 ports. Add a way for users to query the number of lag ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Currently, removing a device needs to get the driver interface lock before
doing any cleanup. If the driver is waiting in a loop for FW init, there
is no way to cancel the wait, instead the device cleanup waits for the
loop to conclude and release the lock.
To allow immediate response to remove device commands, check the TEARDOWN
flag while waiting for FW init, and exit the loop if it has been set.
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Export genphy_c45_baset1_read_status() to make it reusable by PHY drivers.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move baset1 specific part of genphy_c45_read_pma() code to
separate function to make it reusable by PHY drivers.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move baset1 specific part of genphy_c45_pma_setup_forced() code to
separate function to make it reusable by PHY drivers.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add extack support to .ndo_fdb_del in netdevice.h and
all related methods.
Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Helper to calculate the linear data space in the skb.
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add macros to get the next or previous entries and wraparound if
needed. For example, calling list_next_entry_circular() on the last
element should return the first element in the list.
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The last two users were floppy.c and ataflop.c respectively, it was
verified that no other drivers makes use of this, so let's remove it.
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Minh Yuan <yuanmingbuaa@gmail.com>
Cc: Denis Efremov <efremov@linux.com>,
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull NFS client fixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- Fix a socket leak when setting up an AF_LOCAL RPC client
- Ensure that knfsd connects to the gss-proxy daemon on setup
Bugfixes:
- Fix a refcount leak when migrating a task off an offlined transport
- Don't gratuitously invalidate inode attributes on delegation return
- Don't leak sockets in xs_local_connect()
- Ensure timely close of disconnected AF_LOCAL sockets"
* tag 'nfs-for-5.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
Revert "SUNRPC: attempt AF_LOCAL connect on setup"
SUNRPC: Ensure gss-proxy connects on setup
SUNRPC: Ensure timely close of disconnected AF_LOCAL sockets
SUNRPC: Don't leak sockets in xs_local_connect()
NFSv4: Don't invalidate inode attributes on delegation return
SUNRPC release the transport of a relocated task with an assigned transport
|
|
These are now internal to the core, no need to expose them.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Up until commit 46e6b992c250 ("rtnetlink: allow GSO maximums to
be set on device creation") the gso_max_segs and gso_max_size
of a device were not controlled from user space.
The quoted commit added the ability to control them because of
the following setup:
netns A | netns B
veth<->veth eth0
If eth0 has TSO limitations and user wants to efficiently forward
traffic between eth0 and the veths they should copy the TSO
limitations of eth0 onto the veths. This would happen automatically
for macvlans or ipvlan but veth users are not so lucky (given the
loose coupling).
Unfortunately the commit in question allowed users to also override
the limits on real HW devices.
It may be useful to control the max GSO size and someone may be using
that ability (not that I know of any user), so create a separate set
of knobs to reliably record the TSO limitations. Validate the user
requests.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To make later patches smaller create a helper for inheriting
the TSO limitations of a lower device. The TSO in the name
is not an accident, subsequent patches will replace GSO
with TSO in more names.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The find_next_netdev_feature() macro gets the "remaining length",
not bit index.
Passing "bit - 1" for the following iteration is wrong as it skips
the adjacent bit. Pass "bit" instead.
Fixes: 3b89ea9c5902 ("net: Fix for_each_netdev_feature on Big endian")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20220504080914.1918-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
msg_zerocopy_alloc is only used by msg_zerocopy_realloc; remove the
export and make static in skbuff.c
Signed-off-by: David Ahern <dsahern@kernel.org>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/r/20220504170947.18773-1-dsahern@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull folio fixes from Matthew Wilcox:
"Two folio fixes for 5.18.
Darrick and Brian have done amazing work debugging the race I created
in the folio BIO iterator. The readahead problem was deterministic, so
easy to fix.
- Fix a race when we were calling folio_next() in the BIO folio iter
without holding a reference, meaning the folio could be split or
freed, and we'd jump to the next page instead of the intended next
folio.
- Fix readahead creating single-page folios instead of the intended
large folios when doing reads that are not a power of two in size"
* tag 'folio-5.18f' of git://git.infradead.org/users/willy/pagecache:
mm/readahead: Fix readahead with large folios
block: Do not call folio_next() on an unreferenced folio
|
|
tools/testing/selftests/net/forwarding/Makefile
f62c5acc800e ("selftests/net/forwarding: add missing tests to Makefile")
50fe062c806e ("selftests: forwarding: new test, verify host mdb entries")
https://lore.kernel.org/all/20220502111539.0b7e4621@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, rxrpc and wireguard.
Previous releases - regressions:
- igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter()
- mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter()
- rds: acquire netns refcount on TCP sockets
- rxrpc: enable IPv6 checksums on transport socket
- nic: hinic: fix bug of wq out of bound access
- nic: thunder: don't use pci_irq_vector() in atomic context
- nic: bnxt_en: fix possible bnxt_open() failure caused by wrong RFS
flag
- nic: mlx5e:
- lag, fix use-after-free in fib event handler
- fix deadlock in sync reset flow
Previous releases - always broken:
- tcp: fix insufficient TCP source port randomness
- can: grcan: grcan_close(): fix deadlock
- nfc: reorder destructive operations in to avoid bugs
Misc:
- wireguard: improve selftests reliability"
* tag 'net-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
NFC: netlink: fix sleep in atomic bug when firmware download timeout
selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer
tcp: drop the hash_32() part from the index calculation
tcp: increase source port perturb table to 2^16
tcp: dynamically allocate the perturb table used by source ports
tcp: add small random increments to the source port
tcp: resalt the secret every 10 seconds
tcp: use different parts of the port_offset for index and offset
secure_seq: use the 64 bits of the siphash for port offset calculation
wireguard: selftests: set panic_on_warn=1 from cmdline
wireguard: selftests: bump package deps
wireguard: selftests: restore support for ccache
wireguard: selftests: use newer toolchains to fill out architectures
wireguard: selftests: limit parallelism to $(nproc) tests at once
wireguard: selftests: make routing loop test non-fatal
net/mlx5: Fix matching on inner TTC
net/mlx5: Avoid double clear or set of sync reset requested
net/mlx5: Fix deadlock in sync reset flow
net/mlx5e: Fix trust state reset in reload
net/mlx5e: Avoid checking offload capability in post_parse action
...
|
|
It is unsafe to call folio_next() on a folio unless you hold a reference
on it that prevents it from being split or freed. After returning
from the iterator, iomap calls folio_end_writeback() which may drop
the last reference to the page, or allow the page to be split. If that
happens, the iterator will not advance far enough through the bio_vec,
leading to assertion failures like the BUG() in folio_end_writeback()
that checks we're not trying to end writeback on a page not currently
under writeback. Other assertion failures were also seen, but they're
all explained by this one bug.
Fix the bug by remembering where the next folio starts before returning
from the iterator. There are other ways of fixing this bug, but this
seems the simplest.
Reported-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: Brian Foster <bfoster@redhat.com>
Tested-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
|
mlx5 doesn't allow to configure any AEAD ICV length other than 128,
so remove the logic that configures other unsupported values.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The mlx5 IPsec code has logical separation between code that operates
with XFRM objects (ipsec.c), HW objects (ipsec_offload.c), flow steering
logic (ipsec_fs.c) and data path (ipsec_rxtx.c).
Such separation makes sense for C-files, but isn't needed at all for
H-files as they are included in batch anyway.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
All callers build xfrm attributes with help of mlx5e_ipsec_build_accel_xfrm_attrs()
function that ensure validity of attributes. There is no need to recheck
them again.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
mlx5 IPsec code updated ESN through workqueue with allocation calls
in the data path, which can be saved easily if the work is created
during XFRM state initialization routine.
The locking used later in the work didn't protect from anything because
change of HW context is possible during XFRM state add or delete only,
which can cancel work and make sure that it is not running.
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Most drivers should not have to worry about selecting the right
weight for their NAPI instances and pass NAPI_POLL_WEIGHT.
It'd be best if we didn't require the argument at all and selected
the default internally.
This change prepares the ground for such reshuffling, allowing
for a smooth transition. The following API should remain after
the next release cycle:
netif_napi_add()
netif_napi_add_weight()
netif_napi_add_tx()
netif_napi_add_tx_weight()
Where the _weight() variants take an explicit weight argument.
I opted for a _weight() suffix rather than a __ prefix, because
we use __ in places to mean that caller needs to also issue a
synchronize_net() call.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220502232703.396351-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2022-05-02
1) Trivial Misc updates to mlx5 driver
2) From Mark Bloch: Flow steering, general steering refactoring/cleaning
An issue with flow steering deletion flow (when creating a rule without
dests) turned out to be easy to fix but during the fix some issue
with the flow steering creation/deletion flows have been found.
The following patch series tries to fix long standing issues with flow
steering code and hopefully preventing silly future bugs.
A) Fix an issue where a proper dest type wasn't assigned.
B) Refactor and fix dests enums values, refactor deletion
function and do proper bookkeeping of dests.
C) Change mlx5_del_flow_rules() to delete rules when there are no
no more rules attached associated with an FTE.
D) Don't call hard coded deletion function but use the node's
defined one.
E) Add a WARN_ON() to catch future bugs when an FTE with dests
is deleted.
* tag 'mlx5-updates-2022-05-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: fs, an FTE should have no dests when deleted
net/mlx5: fs, call the deletion function of the node
net/mlx5: fs, delete the FTE when there are no rules attached to it
net/mlx5: fs, do proper bookkeeping for forward destinations
net/mlx5: fs, add unused destination type
net/mlx5: fs, jump to exit point and don't fall through
net/mlx5: fs, refactor software deletion rule
net/mlx5: fs, split software and IFC flow destination definitions
net/mlx5e: TC, set proper dest type
net/mlx5e: Remove unused mlx5e_dcbnl_build_rep_netdev function
net/mlx5e: Drop error CQE handling from the XSK RX handler
net/mlx5: Print initializing field in case of timeout
net/mlx5: Delete redundant default assignment of runtime devlink params
net/mlx5: Remove useless kfree
net/mlx5: use kvfree() for kvzalloc() in mlx5_ct_fs_smfs_matcher_create
====================
Link: https://lore.kernel.org/r/
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch introdues the SYSCTL_THREE.
KUnit:
[00:10:14] ================ sysctl_test (10 subtests) =================
[00:10:14] [PASSED] sysctl_test_api_dointvec_null_tbl_data
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_maxlen_unset
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_len_is_zero
[00:10:14] [PASSED] sysctl_test_api_dointvec_table_read_but_position_set
[00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_positive
[00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_negative
[00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_positive
[00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_negative
[00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_less_int_min
[00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_greater_int_max
[00:10:14] =================== [PASSED] sysctl_test ===================
./run_kselftest.sh -c sysctl
...
ok 1 selftests: sysctl: sysctl.sh
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Akhmat Karakotov <hmukos@yandex-team.ru>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When the caller doesn't pass a destination fs_core will create a unused
rule just so a context can be returned. This unused rule
is zeroed out and its type is 0 which can be mixed up with
MLX5_FLOW_DESTINATION_TYPE_VPORT.
Create a dedicated type to differentiate between the two
named MLX5_FLOW_DESTINATION_TYPE_NONE.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Separate flow destinations between software and IFC.
Flow destination type passed by callers was used as the input in
firmware commands and over the years software only types were added
which resulted in mixing between the two.
Create an IFC enum that contains only the flow destinations defined
when talking to the firmware.
Now that there is a proper software only enum for flow destinations
the hardcoded values can be removed as the values are no longer used
in firmware commands.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
====================
pull-request: ieee802154-next 2022-05-01
Miquel Raynal landed two patch series bundled in this pull request.
The first series re-works the symbol duration handling to better
accommodate the needs of the various phy layers in ieee802154.
In the second series Miquel improves th errors handling from drivers
up mac802154. THis streamlines the error handling throughout the
ieee/mac802154 stack in preparation for sync TX to be introduced for
MLME frames.
====================
Link: https://lore.kernel.org/r/20220501194614.1198325-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- A fix to disable PCI/MSI[-X] masking for XEN_HVM guests as that is
solely controlled by the hypervisor
- A build fix to make the function prototype (__warn()) as visible as
the definition itself
- A bunch of objtool annotation fixes which have accumulated over time
- An ORC unwinder fix to handle bad input gracefully
- Well, we thought the microcode gets loaded in time in order to
restore the microcode-emulated MSRs but we thought wrong. So there's
a fix for that to have the ordering done properly
- Add new Intel model numbers
- A spelling fix
* tag 'x86_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
bug: Have __warn() prototype defined unconditionally
x86/Kconfig: fix the spelling of 'becoming' in X86_KERNEL_IBT config
objtool: Use offstr() to print address of missing ENDBR
objtool: Print data address for "!ENDBR" data warnings
x86/xen: Add ANNOTATE_NOENDBR to startup_xen()
x86/uaccess: Add ENDBR to __put_user_nocheck*()
x86/retpoline: Add ANNOTATE_NOENDBR for retpolines
x86/static_call: Add ANNOTATE_NOENDBR to static call trampoline
objtool: Enable unreachable warnings for CLANG LTO
x86,objtool: Explicitly mark idtentry_body()s tail REACHABLE
x86,objtool: Mark cpu_startup_entry() __noreturn
x86,xen,objtool: Add UNWIND hint
lib/strn*,objtool: Enforce user_access_begin() rules
MAINTAINERS: Add x86 unwinding entry
x86/unwind/orc: Recheck address range after stack info was updated
x86/cpu: Load microcode during restore_processor_state()
x86/cpu: Add new Alderlake and Raptorlake CPU model numbers
|