| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2025-09-26
1) Fix field-spanning memcpy warning in AH output.
From Charalampos Mitrodimas.
2) Replace the strcpy() calls for alg_name by strscpy().
From Miguel García.
* tag 'ipsec-next-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: xfrm_user: use strscpy() for alg_name
net: ipv6: fix field-spanning memcpy warning in AH output
====================
Link: https://patch.msgid.link/20250926053025.2242061-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove is_ipv6 from napi_gro_cb and use sk->sk_family instead.
This frees up space for another ip_fixedid bit that will be added
in the next commit.
udp_sock_create always creates either a AF_INET or a AF_INET6 socket,
so using sk->sk_family is reliable. In IPv6-FOU, cfg->ipv6_v6only is
always enabled.
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250923085908.4687-2-richardbgobert@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
busylock was protecting UDP sockets against packet floods,
but unfortunately was not protecting the host itself.
Under stress, many cpus could spin while acquiring the busylock,
and NIC had to drop packets. Or packets would be dropped
in cpu backlog if RPS/RFS were in place.
This patch replaces the busylock by intermediate
lockless queues. (One queue per NUMA node).
This means that fewer number of cpus have to acquire
the UDP receive queue lock.
Most of the cpus can either:
- immediately drop the packet.
- or queue it in their NUMA aware lockless queue.
Then one of the cpu is chosen to process this lockless queue
in a batch.
The batch only contains packets that were cooked on the same
NUMA node, thus with very limited latency impact.
Tested:
DDOS targeting a victim UDP socket, on a platform with 6 NUMA nodes
(Intel(R) Xeon(R) 6985P-C)
Before:
nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams 1004179 0.0
Udp6InErrors 3117 0.0
Udp6RcvbufErrors 3117 0.0
After:
nstat -n ; sleep 1 ; nstat | grep Udp
Udp6InDatagrams 1116633 0.0
Udp6InErrors 14197275 0.0
Udp6RcvbufErrors 14197275 0.0
We can see this host can now proces 14.2 M more packets per second
while under attack, and the victim socket can receive 11 % more
packets.
I used a small bpftrace program measuring time (in us) spent in
__udp_enqueue_schedule_skb().
Before:
@udp_enqueue_us[398]:
[0] 24901 |@@@ |
[1] 63512 |@@@@@@@@@ |
[2, 4) 344827 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4, 8) 244673 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[8, 16) 54022 |@@@@@@@@ |
[16, 32) 222134 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[32, 64) 232042 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[64, 128) 4219 | |
[128, 256) 188 | |
After:
@udp_enqueue_us[398]:
[0] 5608855 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1] 1111277 |@@@@@@@@@@ |
[2, 4) 501439 |@@@@ |
[4, 8) 102921 | |
[8, 16) 29895 | |
[16, 32) 43500 | |
[32, 64) 31552 | |
[64, 128) 979 | |
[128, 256) 13 | |
Note that the remaining bottleneck for this platform is in
udp_drops_inc() because we limited struct numa_drop_counters
to only two nodes so far.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250922104240.2182559-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
inet_hash() and inet6_hash() are exactly the same.
Also, we do not need to export inet6_hash().
Let's consolidate the two into __inet_hash() and rename it to inet_hash().
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250919083706.1863217-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
__inet_hash() is called from inet_hash() and inet6_hash with osk NULL.
Let's remove the 2nd arg from __inet_hash().
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250919083706.1863217-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Daniel Zahka says:
==================
add basic PSP encryption for TCP connections
This is v13 of the PSP RFC [1] posted by Jakub Kicinski one year
ago. General developments since v1 include a fork of packetdrill [2]
with support for PSP added, as well as some test cases, and an
implementation of PSP key exchange and connection upgrade [3]
integrated into the fbthrift RPC library. Both [2] and [3] have been
tested on server platforms with PSP-capable CX7 NICs. Below is the
cover letter from the original RFC:
Add support for PSP encryption of TCP connections.
PSP is a protocol out of Google:
https://github.com/google/psp/blob/main/doc/PSP_Arch_Spec.pdf
which shares some similarities with IPsec. I added some more info
in the first patch so I'll keep it short here.
The protocol can work in multiple modes including tunneling.
But I'm mostly interested in using it as TLS replacement because
of its superior offload characteristics. So this patch does three
things:
- it adds "core" PSP code
PSP is offload-centric, and requires some additional care and
feeding, so first chunk of the code exposes device info.
This part can be reused by PSP implementations in xfrm, tunneling etc.
- TCP integration TLS style
Reuse some of the existing concepts from TLS offload, such as
attaching crypto state to a socket, marking skbs as "decrypted",
egress validation. PSP does not prescribe key exchange protocols.
To use PSP as a more efficient TLS offload we intend to perform
a TLS handshake ("inline" in the same TCP connection) and negotiate
switching to PSP based on capabilities of both endpoints.
This is also why I'm not including a software implementation.
Nobody would use it in production, software TLS is faster,
it has larger crypto records.
- mlx5 implementation
That's mostly other people's work, not 100% sure those folks
consider it ready hence the RFC in the title. But it works :)
Not posted, queued a branch [4] are follow up pieces:
- standard stats
- netdevsim implementation and tests
[1] https://lore.kernel.org/netdev/20240510030435.120935-1-kuba@kernel.org/
[2] https://github.com/danieldzahka/packetdrill
[3] https://github.com/danieldzahka/fbthrift/tree/dzahka/psp
[4] https://github.com/kuba-moo/linux/tree/psp
Comments we intend to defer to future series:
- we prefer to keep the version field in the tx-assoc netlink
request, because it makes parsing keys require less state early
on, but we are willing to change in the next version of this
series.
- using a static branch to wrap psp_enqueue_set_decrypted() and
other functions called from tcp.
- using INDIRECT_CALL for tls/psp in sk_validate_xmit_skb(). We
prefer to address this in a dedicated patch series, so that this
series does not need to modify the way tls_validate_xmit_skb() is
declared and stubbed out.
v12: https://lore.kernel.org/netdev/20250916000559.1320151-1-kuba@kernel.org/
v11: https://lore.kernel.org/20250911014735.118695-1-daniel.zahka@gmail.com
v10: https://lore.kernel.org/netdev/20250828162953.2707727-1-daniel.zahka@gmail.com/
v9: https://lore.kernel.org/netdev/20250827155340.2738246-1-daniel.zahka@gmail.com/
v8: https://lore.kernel.org/netdev/20250825200112.1750547-1-daniel.zahka@gmail.com/
v7: https://lore.kernel.org/netdev/20250820113120.992829-1-daniel.zahka@gmail.com/
v6: https://lore.kernel.org/netdev/20250812003009.2455540-1-daniel.zahka@gmail.com/
v5: https://lore.kernel.org/netdev/20250723203454.519540-1-daniel.zahka@gmail.com/
v4: https://lore.kernel.org/netdev/20250716144551.3646755-1-daniel.zahka@gmail.com/
v3: https://lore.kernel.org/netdev/20250702171326.3265825-1-daniel.zahka@gmail.com/
v2: https://lore.kernel.org/netdev/20250625135210.2975231-1-daniel.zahka@gmail.com/
v1: https://lore.kernel.org/netdev/20240510030435.120935-1-kuba@kernel.org/
==================
Links: https://patch.msgid.link/20250917000954.859376-1-daniel.zahka@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
* add-basic-psp-encryption-for-tcp-connections:
net/mlx5e: Implement PSP key_rotate operation
net/mlx5e: Add Rx data path offload
psp: provide decapsulation and receive helper for drivers
net/mlx5e: Configure PSP Rx flow steering rules
net/mlx5e: Add PSP steering in local NIC RX
net/mlx5e: Implement PSP Tx data path
psp: provide encapsulation helper for drivers
net/mlx5e: Implement PSP operations .assoc_add and .assoc_del
net/mlx5e: Support PSP offload functionality
psp: track generations of device key
net: psp: update the TCP MSS to reflect PSP packet overhead
net: psp: add socket security association code
net: tcp: allow tcp_timewait_sock to validate skbs before handing to device
net: move sk_validate_xmit_skb() to net/core/dev.c
psp: add op for rotation of device key
tcp: add datapath logic for PSP with inline key exchange
net: modify core data structures for PSP datapath support
psp: base PSP device support
psp: add documentation
|
|
PSP eats 40B of header space. Adjust MSS appropriately.
We can either modify tcp_mtu_to_mss() / tcp_mss_to_mtu()
or reuse icsk_ext_hdr_len. The former option is more TCP
specific and has runtime overhead. The latter is a bit
of a hack as PSP is not an ext_hdr. If one squints hard
enough, UDP encap is just a more practical version of
IPv6 exthdr, so go with the latter. Happy to change.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250917000954.859376-10-daniel.zahka@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add validation points and state propagation to support PSP key
exchange inline, on TCP connections. The expectation is that
application will use some well established mechanism like TLS
handshake to establish a secure channel over the connection and
if both endpoints are PSP-capable - exchange and install PSP keys.
Because the connection can existing in PSP-unsecured and PSP-secured
state we need to make sure that there are no race conditions or
retransmission leaks.
On Tx - mark packets with the skb->decrypted bit when PSP key
is at the enqueue time. Drivers should only encrypt packets with
this bit set. This prevents retransmissions getting encrypted when
original transmission was not. Similarly to TLS, we'll use
sk->sk_validate_xmit_skb to make sure PSP skbs can't "escape"
via a PSP-unaware device without being encrypted.
On Rx - validation is done under socket lock. This moves the validation
point later than xfrm, for example. Please see the documentation patch
for more details on the flow of securing a connection, but for
the purpose of this patch what's important is that we want to
enforce the invariant that once connection is secured any skb
in the receive queue has been encrypted with PSP.
Add GRO and coalescing checks to prevent PSP authenticated data from
being combined with cleartext data, or data with non-matching PSP
state. On Rx, check skb's with psp_skb_coalesce_diff() at points
before psp_sk_rx_policy_check(). After skb's are policy checked and on
the socket receive queue, skb_cmp_decrypted() is sufficient for
checking for coalescable PSP state. On Tx, tcp_write_collapse_fence()
should be called when transitioning a socket into PSP Tx state to
prevent data sent as cleartext from being coalesced with PSP
encapsulated data.
This change only adds the validation points, for ease of review.
Subsequent change will add the ability to install keys, and flesh
the enforcement logic out
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Daniel Zahka <daniel.zahka@gmail.com>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250917000954.859376-5-daniel.zahka@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Generic sk_drops_inc() reads sk->sk_drop_counters.
We know the precise location for UDP sockets.
Move sk_drop_counters out of sock_read_rxtx
so that sock_write_rxtx starts at a cache line boundary.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250916160951.541279-9-edumazet@google.com
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add READ_ONCE() annotations because np->rxpmtu can be changed
while udpv6_recvmsg() and rawv6_recvmsg() read it.
Since this is a very rarely used feature, and that udpv6_recvmsg()
and rawv6_recvmsg() read np->rxopt anyway, change the test order
so that np->rxpmtu does not need to be in a hot cache line.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250916160951.541279-4-edumazet@google.com
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ipv6_pinfo.daddr_cache is either NULL or &sk->sk_v6_daddr
We do not need 8 bytes, a boolean is enough.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250916160951.541279-3-edumazet@google.com
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ipv6_pinfo.saddr_cache is either NULL or &np->saddr.
We do not need 8 bytes, a boolean is enough.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250916160951.541279-2-edumazet@google.com
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Accurate ECN negotiation parts based on the specification:
https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt
Accurate ECN is negotiated using ECE, CWR and AE flags in the
TCP header. TCP falls back into using RFC3168 ECN if one of the
ends supports only RFC3168-style ECN.
The AccECN negotiation includes reflecting IP ECN field value
seen in SYN and SYNACK back using the same bits as negotiation
to allow responding to SYN CE marks and to detect ECN field
mangling. CE marks should not occur currently because SYN=1
segments are sent with Non-ECT in IP ECN field (but proposal
exists to remove this restriction).
Reflecting SYN IP ECN field in SYNACK is relatively simple.
Reflecting SYNACK IP ECN field in the final/third ACK of
the handshake is more challenging. Linux TCP code is not well
prepared for using the final/third ACK a signalling channel
which makes things somewhat complicated here.
tcp_ecn sysctl can be used to select the highest ECN variant
(Accurate ECN, ECN, No ECN) that is attemped to be negotiated and
requested for incoming connection and outgoing connection:
TCP_ECN_IN_NOECN_OUT_NOECN, TCP_ECN_IN_ECN_OUT_ECN,
TCP_ECN_IN_ECN_OUT_NOECN, TCP_ECN_IN_ACCECN_OUT_ACCECN,
TCP_ECN_IN_ACCECN_OUT_ECN, and TCP_ECN_IN_ACCECN_OUT_NOECN.
After this patch, the size of tcp_request_sock remains unchanged
and no new holes are added. Below are the pahole outcomes before
and after this patch:
[BEFORE THIS PATCH]
struct tcp_request_sock {
[...]
u32 rcv_nxt; /* 352 4 */
u8 syn_tos; /* 356 1 */
/* size: 360, cachelines: 6, members: 16 */
}
[AFTER THIS PATCH]
struct tcp_request_sock {
[...]
u32 rcv_nxt; /* 352 4 */
u8 syn_tos; /* 356 1 */
bool accecn_ok; /* 357 1 */
u8 syn_ect_snt:2; /* 358: 0 1 */
u8 syn_ect_rcv:2; /* 358: 2 1 */
u8 accecn_fail_mode:4; /* 358: 4 1 */
/* size: 360, cachelines: 6, members: 20 */
}
After this patch, the size of tcp_sock remains unchanged and no new
holes are added. Also, 4 bits of the existing 2-byte hole are exploited.
Below are the pahole outcomes before and after this patch:
[BEFORE THIS PATCH]
struct tcp_sock {
[...]
u8 dup_ack_counter:2; /* 2761: 0 1 */
u8 tlp_retrans:1; /* 2761: 2 1 */
u8 unused:5; /* 2761: 3 1 */
u8 thin_lto:1; /* 2762: 0 1 */
u8 fastopen_connect:1; /* 2762: 1 1 */
u8 fastopen_no_cookie:1; /* 2762: 2 1 */
u8 fastopen_client_fail:2; /* 2762: 3 1 */
u8 frto:1; /* 2762: 5 1 */
/* XXX 2 bits hole, try to pack */
[...]
u8 keepalive_probes; /* 2765 1 */
/* XXX 2 bytes hole, try to pack */
[...]
/* size: 3200, cachelines: 50, members: 164 */
}
[AFTER THIS PATCH]
struct tcp_sock {
[...]
u8 dup_ack_counter:2; /* 2761: 0 1 */
u8 tlp_retrans:1; /* 2761: 2 1 */
u8 syn_ect_snt:2; /* 2761: 3 1 */
u8 syn_ect_rcv:2; /* 2761: 5 1 */
u8 thin_lto:1; /* 2761: 7 1 */
u8 fastopen_connect:1; /* 2762: 0 1 */
u8 fastopen_no_cookie:1; /* 2762: 1 1 */
u8 fastopen_client_fail:2; /* 2762: 2 1 */
u8 frto:1; /* 2762: 4 1 */
/* XXX 3 bits hole, try to pack */
[...]
u8 keepalive_probes; /* 2765 1 */
u8 accecn_fail_mode:4; /* 2766: 0 1 */
/* XXX 4 bits hole, try to pack */
/* XXX 1 byte hole, try to pack */
[...]
/* size: 3200, cachelines: 50, members: 166 */
}
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Co-developed-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia.com>
Co-developed-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250916082434.100722-3-chia-yu.chang@nokia-bell-labs.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter: updates for net-next
1) Don't respond to ICMP_UNREACH errors with another ICMP_UNREACH
error.
2) Support fetching the current bridge ethernet address.
This allows a more flexible approach to packet redirection
on bridges without need to use hardcoded addresses. From
Fernando Fernandez Mancera.
3) Zap a few no-longer needed conditionals from ipvs packet path
and convert to READ/WRITE_ONCE to avoid KCSAN warnings.
From Zhang Tengfei.
4) Remove a no-longer-used macro argument in ipset, from Zhen Ni.
* tag 'nf-next-25-09-11' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_reject: don't reply to icmp error messages
ipvs: Use READ_ONCE/WRITE_ONCE for ipvs->enable
netfilter: nft_meta_bridge: introduce NFT_META_BRI_IIFHWADDR support
netfilter: ipset: Remove unused htable_bits in macro ahash_region
selftest:net: fixed spelling mistakes
====================
Link: https://patch.msgid.link/20250911143819.14753-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently there are a couple of minor issues with destroying the keys
tcp_v4_destroy_sock():
1. The socket is yet in TCP bind buckets, making it reachable for
incoming segments [on another CPU core], potentially available to send
late FIN/ACK/RST replies.
2. There is at least one code path, where tcp_done() is called before
sending RST [kudos to Bob for investigation]. This is a case of
a server, that finished sending its data and just called close().
The socket is in TCP_FIN_WAIT2 and has RCV_SHUTDOWN (set by
__tcp_close())
tcp_v4_do_rcv()/tcp_v6_do_rcv()
tcp_rcv_state_process() /* LINUX_MIB_TCPABORTONDATA */
tcp_reset()
tcp_done_with_error()
tcp_done()
inet_csk_destroy_sock() /* Destroys AO/MD5 keys */
/* tcp_rcv_state_process() returns SKB_DROP_REASON_TCP_ABORT_ON_DATA */
tcp_v4_send_reset() /* Sends an unsigned RST segment */
tcpdump:
> 22:53:15.399377 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 33929, offset 0, flags [DF], proto TCP (6), length 60)
> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [F.], seq 2185658590, ack 3969644355, win 502, options [nop,nop,md5 valid], length 0
> 22:53:15.399396 00:00:01:01:00:00 > 00:00:b2:1f:00:00, ethertype IPv4 (0x0800), length 86: (tos 0x0, ttl 64, id 51951, offset 0, flags [DF], proto TCP (6), length 72)
> 1.0.0.2.49848 > 1.0.0.1.34567: Flags [.], seq 3969644375, ack 2185658591, win 128, options [nop,nop,md5 valid,nop,nop,sack 1 {2185658590:2185658591}], length 0
> 22:53:16.429588 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658590, win 0, length 0
> 22:53:16.664725 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
> 22:53:17.289832 00:00:b2:1f:00:00 > 00:00:01:01:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
> 1.0.0.1.34567 > 1.0.0.2.49848: Flags [R], seq 2185658591, win 0, options [nop,nop,md5 valid], length 0
Note the signed RSTs later in the dump - those are sent by the server
when the fin-wait socket gets removed from hash buckets, by
the listener socket.
Instead of destroying AO/MD5 info and their keys in inet_csk_destroy_sock(),
slightly delay it until the actual socket .sk_destruct(). As shutdown'ed
socket can yet send non-data replies, they should be signed in order for
the peer to process them. Now it also matches how AO/MD5 gets destructed
for TIME-WAIT sockets (in tcp_twsk_destructor()).
This seems optimal for TCP-MD5, while for TCP-AO it seems to have an
open problem: once RST get sent and socket gets actually destructed,
there is no information on the initial sequence numbers. So, in case
this last RST gets lost in the network, the server's listener socket
won't be able to properly sign another RST. Nothing in RFC 1122
prescribes keeping any local state after non-graceful reset.
Luckily, BGP are known to use keep alive(s).
While the issue is quite minor/cosmetic, these days monitoring network
counters is a common practice and getting invalid signed segments from
a trusted BGP peer can get customers worried.
Investigated-by: Bob Gilligan <gilligan@arista.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://patch.msgid.link/20250909-b4-tcp-ao-md5-rst-finwait2-v5-1-9ffaaaf8b236@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Correct typos in ipv6/udp.c comments:
"execeeds" -> "exceeds"
"tacking care" -> "taking care"
"measureable" -> "measurable"
No functional changes.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250909122611.3711859-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tcp reject code won't reply to a tcp reset.
But the icmp reject 'netdev' family versions will reply to icmp
dst-unreach errors, unlike icmp_send() and icmp6_send() which are used
by the inet family implementation (and internally by the REJECT target).
Check for the icmp(6) type and do not respond if its an unreachable error.
Without this, something like 'ip protocol icmp reject', when used
in a netdev chain attached to 'lo', cause a packet loop.
Same for two hosts that both use such a rule: each error packet
will be replied to.
Such situation persist until the (bogus) rule is amended to ratelimit or
checks the icmp type before the reject statement.
As the inet versions don't do this make the netdev ones follow along.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Blamed commit added a critical false sharing on a single
atomic_long_t under DOS, like receiving UDP packets
to closed ports.
Per netns ICMP6_MIB_RATELIMITHOST tracking uses per-cpu
storage and is enough, we do not need per-device and slow tracking.
Fixes: d0941130c9351 ("icmp: Add counters for rate limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Cc: Abhishek Rawal <rawal.abhishek92@gmail.com>
Link: https://patch.msgid.link/20250905165813.1470708-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use ARRAY_SIZE(), so that we know the limit at compile time.
Following patch needs this preliminary change.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250905165813.1470708-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This 2KB array can be replaced by a switch() to save space.
Before:
$ size net/ipv6/proc.o
text data bss dec hex filename
6410 624 0 7034 1b7a net/ipv6/proc.o
After:
$ size net/ipv6/proc.o
text data bss dec hex filename
5516 592 0 6108 17dc net/ipv6/proc.o
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20250905165813.1470708-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.17-rc5).
No conflicts.
Adjacent changes:
include/net/sock.h
c51613fa276f ("net: add sk->sk_drop_counters")
5d6b58c932ec ("net: lockless sock_i_ino()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extract the dst lookup logic from ipip6_tunnel_xmit() into new helper
ipip6_tunnel_dst_find() to reduce code duplication and enhance readability.
No functional change intended.
On a x86_64, with allmodconfig object size is also reduced:
./scripts/bloat-o-meter net/ipv6/sit.o net/ipv6/sit-new.o
add/remove: 5/3 grow/shrink: 3/4 up/down: 1841/-2275 (-434)
Function old new delta
ipip6_tunnel_dst_find - 1697 +1697
__pfx_ipip6_tunnel_dst_find - 64 +64
__UNIQUE_ID_modinfo2094 - 43 +43
ipip6_tunnel_xmit.isra.cold 79 88 +9
__UNIQUE_ID_modinfo2096 12 20 +8
__UNIQUE_ID___addressable_init_module2092 - 8 +8
__UNIQUE_ID___addressable_cleanup_module2093 - 8 +8
__func__ 55 59 +4
__UNIQUE_ID_modinfo2097 20 18 -2
__UNIQUE_ID___addressable_init_module2093 8 - -8
__UNIQUE_ID___addressable_cleanup_module2094 8 - -8
__UNIQUE_ID_modinfo2098 18 - -18
__UNIQUE_ID_modinfo2095 43 12 -31
descriptor 112 56 -56
ipip6_tunnel_xmit.isra 9910 7758 -2152
Total: Before=72537, After=72103, chg -0.60%
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://patch.msgid.link/20250901114857.1968513-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter: updates for net-next
1) prefer vmalloc_array in ebtables, from Qianfeng Rong.
2) Use csum_replace4 instead of open-coding it, from Christophe Leroy.
3+4) Get rid of GFP_ATOMIC in transaction object allocations, those
cause silly failures with large sets under memory pressure, from
myself.
5) Remove test for AVX cpu feature in nftables pipapo set type,
testing for AVX2 feature is sufficient.
6) Unexport a few function in nf_reject infra: no external callers.
7) Extend payload offset to u16, this was restricted to values <=255
so far, from Fernando Fernandez Mancera.
* tag 'nf-next-25-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nft_payload: extend offset to 65535 bytes
netfilter: nf_reject: remove unneeded exports
netfilter: nft_set_pipapo: remove redundant test for avx feature bit
netfilter: nf_tables: all transaction allocations can now sleep
netfilter: nf_tables: allow iter callbacks to sleep
netfilter: nft_payload: Use csum_replace4() instead of opencoding
netfilter: ebtables: Use vmalloc_array() to improve code
====================
Link: https://patch.msgid.link/20250902133549.15945-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In ipv6_rpl_srh_rcv() we use min(net->ipv6.devconf_all->rpl_seg_enabled,
idev->cnf.rpl_seg_enabled) is intended to return 0 when either value is
zero, but if one of the values is negative it will in fact return non-zero.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250901123726.1972881-3-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
devconf->rpl_seg_enabled can be changed concurrently from
/proc/sys/net/ipv6/conf, annotate lockless reads on it.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250901123726.1972881-2-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When tcp_ao_copy_all_matching() fails in tcp_v6_syn_recv_sock() it just
exits the function. This ends up causing a memory-leak:
unreferenced object 0xffff0000281a8200 (size 2496):
comm "softirq", pid 0, jiffies 4295174684
hex dump (first 32 bytes):
7f 00 00 06 7f 00 00 06 00 00 00 00 cb a8 88 13 ................
0a 00 03 61 00 00 00 00 00 00 00 00 00 00 00 00 ...a............
backtrace (crc 5ebdbe15):
kmemleak_alloc+0x44/0xe0
kmem_cache_alloc_noprof+0x248/0x470
sk_prot_alloc+0x48/0x120
sk_clone_lock+0x38/0x3b0
inet_csk_clone_lock+0x34/0x150
tcp_create_openreq_child+0x3c/0x4a8
tcp_v6_syn_recv_sock+0x1c0/0x620
tcp_check_req+0x588/0x790
tcp_v6_rcv+0x5d0/0xc18
ip6_protocol_deliver_rcu+0x2d8/0x4c0
ip6_input_finish+0x74/0x148
ip6_input+0x50/0x118
ip6_sublist_rcv+0x2fc/0x3b0
ipv6_list_rcv+0x114/0x170
__netif_receive_skb_list_core+0x16c/0x200
netif_receive_skb_list_internal+0x1f0/0x2d0
This is because in tcp_v6_syn_recv_sock (and the IPv4 counterpart), when
exiting upon error, inet_csk_prepare_forced_close() and tcp_done() need
to be called. They make sure the newsk will end up being correctly
free'd.
tcp_v4_syn_recv_sock() makes this very clear by having the put_and_exit
label that takes care of things. So, this patch here makes sure
tcp_v4_syn_recv_sock and tcp_v6_syn_recv_sock have similar
error-handling and thus fixes the leak for TCP-AO.
Fixes: 06b22ef29591 ("net/tcp: Wire TCP-AO to request sockets")
Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250830-tcpao_leak-v1-1-e5878c2c3173@openai.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These functions have no external callers and can be static.
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
There is no point in keeping ping_hash().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://patch.msgid.link/20250829153054.474201-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The icmp_ndo_send function was originally introduced to ensure proper
rate limiting when icmp_send is called by a network device driver,
where the packet's source address may have already been transformed
by SNAT.
However, the original implementation only considers the
IP_CT_DIR_ORIGINAL direction for SNAT and always replaced the packet's
source address with that of the original-direction tuple. This causes
two problems:
1. For SNAT:
Reply-direction packets were incorrectly translated using the source
address of the CT original direction, even though no translation is
required.
2. For DNAT:
Reply-direction packets were not handled at all. In DNAT, the original
direction's destination is translated. Therefore, in the reply
direction the source address must be set to the reply-direction
source, so rate limiting works as intended.
Fix this by using the connection direction to select the correct tuple
for source address translation, and adjust the pre-checks to handle
reply-direction packets in case of DNAT.
Additionally, wrap the `ct->status` access in READ_ONCE(). This avoids
possible KCSAN reports about concurrent updates to `ct->status`.
Fixes: 0b41713b6066 ("icmp: introduce helper for nat'd source address in network device context")
Signed-off-by: Fabian Bläse <fabian@blaese.de>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
TCP tracks the number of orphaned (SOCK_DEAD but not yet destructed)
sockets in tcp_orphan_count.
In some code that was shared with DCCP, tcp_orphan_count is referenced
via sk->sk_prot->orphan_count.
Let's reference tcp_orphan_count directly.
inet_csk_prepare_for_destroy_sock() is moved to inet_connection_sock.c
due to header dependency.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250829215641.711664-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use RCU in ip6_output() in order to use dst_dev_rcu() to prevent
possible UAF.
We can remove rcu_read_lock()/rcu_read_unlock() pairs
from ip6_finish_output2().
Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use RCU in ip6_xmit() in order to use dst_dev_rcu() to prevent
possible UAF.
Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor icmpv6_xrlim_allow() and ip6_dst_hoplimit()
so that we acquire rcu_read_lock() a bit longer
to be able to use dst_dev_rcu() instead of dst_dev().
__ip6_rt_update_pmtu() and rt6_do_redirect can directly
use dst_dev_rcu() in sections already holding rcu_read_lock().
Small changes to use dst_dev_net_rcu() in
ip6_default_advmss(), ipv6_sock_ac_join(),
ip6_mc_find_dev() and ndisc_send_skb().
Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250828195823.3958522-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a packet flood hits one or more RAW sockets, many cpus
have to update sk->sk_drops.
This slows down other cpus, because currently
sk_drops is in sock_write_rx group.
Add a socket_drop_counters structure to raw sockets.
Using dedicated cache lines to hold drop counters
makes sure that consumers no longer suffer from
false sharing if/when producers only change sk->sk_drops.
This adds 128 bytes per RAW socket.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-6-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Existing sk_drops_add() helper is renamed to sk_drops_skbadd().
Add sk_drops_add() and convert sk_drops_inc() to use it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-3-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We want to split sk->sk_drops in the future to reduce
potential contention on this field.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250826125031.1578842-2-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Prepare the HMAC key when it is added to the kernel, instead of
preparing it implicitly for every packet. This significantly improves
the performance of seg6_hmac_compute(). A microbenchmark on x86_64
shows seg6_hmac_compute() (with HMAC-SHA256) dropping from ~1978 cycles
to ~1419 cycles, a 28% improvement.
The size of 'struct seg6_hmac_info' increases by 128 bytes, but that
should be fine, since there should not be a massive number of keys.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20250824013644.71928-3-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use the HMAC-SHA1 and HMAC-SHA256 library functions instead of
crypto_shash. This is simpler and faster. Pre-allocating per-CPU hash
transformation objects and descriptors is no longer needed, and a
microbenchmark on x86_64 shows seg6_hmac_compute() (with HMAC-SHA256)
dropping from ~2494 cycles to ~1978 cycles, a 20% improvement.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20250824013644.71928-2-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These socket lookup functions required struct inet_hashinfo because
they are shared by TCP and DCCP.
* __inet_lookup_established()
* __inet_lookup_listener()
* __inet6_lookup_established()
* inet6_lookup_listener()
DCCP has gone, and we don't need to pass hashinfo down to them.
Let's fetch net->ipv4.tcp_death_row.hashinfo directly in the above
4 functions.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250822190803.540788-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 6c886db2e78c ("net: remove duplicate sk_lookup helpers")
started to check if hashinfo == net->ipv4.tcp_death_row.hashinfo
in __inet_lookup_listener() and inet6_lookup_listener() and
stopped invoking BPF sk_lookup prog for DCCP.
DCCP has gone and the condition is always true.
Let's remove the hashinfo test.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250822190803.540788-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since DCCP has been removed, sk->sk_prot->twsk_prot->twsk_destructor
is always tcp_twsk_destructor().
Let's call tcp_twsk_destructor() directly in inet_twsk_free() and
remove ->twsk_destructor().
While at it, tcp_twsk_destructor() is un-exported.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250822190803.540788-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 383eed2de529 ("tcp: get rid of twsk_unique()") added
sk->sk_protocol test in __inet_check_established() and
__inet6_check_established() to remove twsk_unique() and call
tcp_twsk_unique() directly.
DCCP has gone, and the condition is always true.
Let's remove the sk_protocol test.
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250822190803.540788-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extract the same code logic from __ipv6_sock_mc_join() and
ip6_mc_find_dev(), also add new helper ip6_mc_find_idev() to
reduce redundancy and enhance readability.
No functional changes intended.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Link: https://patch.msgid.link/20250822064051.2991480-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
icsk->icsk_probes_out is read locklessly from inet_sk_diag_fill(),
get_tcp4_sock() and get_tcp6_sock().
Add corresponding READ_ONCE()/WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250822091727.835869-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
icsk->icsk_retransmits is read locklessly from inet_sk_diag_fill(),
tcp_get_timestamping_opt_stats, get_tcp4_sock() and get_tcp6_sock().
Add corresponding READ_ONCE()/WRITE_ONCE() annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250822091727.835869-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.17-rc3).
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
recent patches to add a WARN() when replacing skb dst entry found an
old bug:
WARNING: include/linux/skbuff.h:1165 skb_dst_check_unset include/linux/skbuff.h:1164 [inline]
WARNING: include/linux/skbuff.h:1165 skb_dst_set include/linux/skbuff.h:1210 [inline]
WARNING: include/linux/skbuff.h:1165 nf_reject_fill_skb_dst+0x2a4/0x330 net/ipv4/netfilter/nf_reject_ipv4.c:234
[..]
Call Trace:
nf_send_unreach+0x17b/0x6e0 net/ipv4/netfilter/nf_reject_ipv4.c:325
nft_reject_inet_eval+0x4bc/0x690 net/netfilter/nft_reject_inet.c:27
expr_call_ops_eval net/netfilter/nf_tables_core.c:237 [inline]
..
This is because blamed commit forgot about loopback packets.
Such packets already have a dst_entry attached, even at PRE_ROUTING stage.
Instead of checking hook just check if the skb already has a route
attached to it.
Fixes: f53b9b0bdc59 ("netfilter: introduce support for reject at prerouting stage")
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20250820123707.10671-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
SO_RCVBUF and SO_SNDBUF have limited range today, unless
distros or system admins change rmem_max and wmem_max.
Even iproute2 uses 1 MB SO_RCVBUF which is capped by
the kernel.
Decouple [rw]mem_max and [rw]mem_default and increase
[rw]mem_max to 4 MB.
Before:
$ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.core.wmem_default = 212992
net.core.wmem_max = 212992
After:
$ sysctl net.core.rmem_default net.core.rmem_max net.core.wmem_default net.core.wmem_max
net.core.rmem_default = 212992
net.core.rmem_max = 4194304
net.core.wmem_default = 212992
net.core.wmem_max = 4194304
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250819174030.1986278-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To prevent timing attacks, MACs need to be compared in constant time.
Use the appropriate helper function for this.
Fixes: bf355b8d2c30 ("ipv6: sr: add core files for SR HMAC support")
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Link: https://patch.msgid.link/20250818202724.15713-1-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace the strcpy() call that copies the device name into
tunnel->parms.name with strscpy(), to avoid potential overflow
and guarantee NULL termination. This uses the two-argument
form of strscpy(), where the destination size is inferred
from the array type.
Destination is tunnel->parms.name (size IFNAMSIZ).
Tested in QEMU (Alpine rootfs):
- Created IPv6 GRE tunnels over loopback
- Assigned overlay IPv6 addresses
- Verified bidirectional ping through the tunnel
- Changed tunnel parameters at runtime (`ip -6 tunnel change`)
Signed-off-by: Miguel García <miguelgarciaroman8@gmail.com>
Link: https://patch.msgid.link/20250818220203.899338-1-miguelgarciaroman8@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|