diff options
| author | Martin KaFai Lau <martin.lau@kernel.org> | 2024-01-16 14:42:40 -0800 |
|---|---|---|
| committer | Alexei Starovoitov <ast@kernel.org> | 2024-01-23 14:42:51 -0800 |
| commit | 4eaafe5a5b7b5f2fcec22914bc5b8b2d860896b7 (patch) | |
| tree | e9af2ccb810881632330d014ecc83b1726537a16 /include | |
| parent | d177c1be06ce28aa8c8710ac55be1b5ad3f314c6 (diff) | |
| parent | a74712241b4675175cd8e3310fa206d8756ad5a1 (diff) | |
Merge branch 'bpf: tcp: Support arbitrary SYN Cookie at TC.'
Kuniyuki Iwashima says:
====================
Under SYN Flood, the TCP stack generates SYN Cookie to remain stateless
for the connection request until a valid ACK is responded to the SYN+ACK.
The cookie contains two kinds of host-specific bits, a timestamp and
secrets, so only can it be validated by the generator. It means SYN
Cookie consumes network resources between the client and the server;
intermediate nodes must remember which nodes to route ACK for the cookie.
SYN Proxy reduces such unwanted resource allocation by handling 3WHS at
the edge network. After SYN Proxy completes 3WHS, it forwards SYN to the
backend server and completes another 3WHS. However, since the server's
ISN differs from the cookie, the proxy must manage the ISN mappings and
fix up SEQ/ACK numbers in every packet for each connection. If a proxy
node goes down, all the connections through it are terminated. Keeping
a state at proxy is painful from that perspective.
At AWS, we use a dirty hack to build truly stateless SYN Proxy at scale.
Our SYN Proxy consists of the front proxy layer and the backend kernel
module. (See slides of LPC2023 [0], p37 - p48)
The cookie that SYN Proxy generates differs from the kernel's cookie in
that it contains a secret (called rolling salt) (i) shared by all the proxy
nodes so that any node can validate ACK and (ii) updated periodically so
that old cookies cannot be validated and we need not encode a timestamp for
the cookie. Also, ISN contains WScale, SACK, and ECN, not in TS val. This
is not to sacrifice any connection quality, where some customers turn off
TCP timestamps option due to retro CVE.
After 3WHS, the proxy restores SYN, encapsulates ACK into SYN, and forward
the TCP-in-TCP packet to the backend server. Our kernel module works at
Netfilter input/output hooks and first feeds SYN to the TCP stack to
initiate 3WHS. When the module is triggered for SYN+ACK, it looks up the
corresponding request socket and overwrites tcp_rsk(req)->snt_isn with the
proxy's cookie. Then, the module can complete 3WHS with the original ACK
as is.
This way, our SYN Proxy does not manage the ISN mappings nor wait for
SYN+ACK from the backend thus can remain stateless. It's working very
well for high-bandwidth services like multiple Tbps, but we are looking
for a way to drop the dirty hack and further optimise the sequences.
If we could validate an arbitrary SYN Cookie on the backend server with
BPF, the proxy would need not restore SYN nor pass it. After validating
ACK, the proxy node just needs to forward it, and then the server can do
the lightweight validation (e.g. check if ACK came from proxy nodes, etc)
and create a connection from the ACK.
This series allows us to create a full sk from an arbitrary SYN Cookie,
which is done in 3 steps.
1) At tc, BPF prog calls a new kfunc to create a reqsk and configure
it based on the argument populated from SYN Cookie. The reqsk has
its listener as req->rsk_listener and is passed to the TCP stack as
skb->sk.
2) During TCP socket lookup for the skb, skb_steal_sock() returns a
listener in the reuseport group that inet_reqsk(skb->sk)->rsk_listener
belongs to.
3) In cookie_v[46]_check(), the reqsk (skb->sk) is fully initialised and
a full sk is created.
The kfunc usage is as follows:
struct bpf_tcp_req_attrs attrs = {
.mss = mss,
.wscale_ok = wscale_ok,
.rcv_wscale = rcv_wscale, /* Server's WScale < 15 */
.snd_wscale = snd_wscale, /* Client's WScale < 15 */
.tstamp_ok = tstamp_ok,
.rcv_tsval = tsval,
.rcv_tsecr = tsecr, /* Server's Initial TSval */
.usec_ts_ok = usec_ts_ok,
.sack_ok = sack_ok,
.ecn_ok = ecn_ok,
}
skc = bpf_skc_lookup_tcp(...);
sk = (struct sock *)bpf_skc_to_tcp_sock(skc);
bpf_sk_assign_tcp_reqsk(skb, sk, attrs, sizeof(attrs));
bpf_sk_release(skc);
[0]: https://lpc.events/event/17/contributions/1645/attachments/1350/2701/SYN_Proxy_at_Scale_with_BPF.pdf
Changes:
v8
* Rebase on Yonghong's cpuv4 fix
* Patch 5
* Fill the trailing 3-bytes padding in struct bpf_tcp_req_attrs
and test it as null
* Patch 6
* Remove unused IPPROTP_MPTCP definition
v7: https://lore.kernel.org/bpf/20231221012806.37137-1-kuniyu@amazon.com/
* Patch 5 & 6
* Drop MPTCP support
v6: https://lore.kernel.org/bpf/20231214155424.67136-1-kuniyu@amazon.com/
* Patch 5 & 6
* /struct /s/tcp_cookie_attributes/bpf_tcp_req_attrs/
* Don't reuse struct tcp_options_received and use u8 for each attrs
* Patch 6
* Check retval of test__start_subtest()
v5: https://lore.kernel.org/netdev/20231211073650.90819-1-kuniyu@amazon.com/
* Split patch 1-3
* Patch 3
* Clear req->rsk_listener in skb_steal_sock()
* Patch 4 & 5
* Move sysctl validation and tsoff init from cookie_bpf_check() to kfunc
* Patch 5
* Do not increment LINUX_MIB_SYNCOOKIES(RECV|FAILED)
* Patch 6
* Remove __always_inline
* Test if tcp_handle_{syn,ack}() is executed
* Move some definition to bpf_tracing_net.h
* s/BPF_F_CURRENT_NETNS/-1/
v4: https://lore.kernel.org/bpf/20231205013420.88067-1-kuniyu@amazon.com/
* Patch 1 & 2
* s/CONFIG_SYN_COOKIE/CONFIG_SYN_COOKIES/
* Patch 1
* Don't set rcv_wscale for BPF SYN Cookie case.
* Patch 2
* Add test for tcp_opt.{unused,rcv_wscale} in kfunc
* Modify skb_steal_sock() to avoid resetting skb-sk
* Support SO_REUSEPORT lookup
* Patch 3
* Add CONFIG_SYN_COOKIES to Kconfig for CI
* Define BPF_F_CURRENT_NETNS
v3: https://lore.kernel.org/netdev/20231121184245.69569-1-kuniyu@amazon.com/
* Guard kfunc and req->syncookie part in inet6?_steal_sock() with
CONFIG_SYN_COOKIE
v2: https://lore.kernel.org/netdev/20231120222341.54776-1-kuniyu@amazon.com/
* Drop SOCK_OPS and move SYN Cookie validation logic to TC with kfunc.
* Add cleanup patches to reduce discrepancy between cookie_v[46]_check()
v1: https://lore.kernel.org/bpf/20231013220433.70792-1-kuniyu@amazon.com/
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Diffstat (limited to 'include')
| -rw-r--r-- | include/net/request_sock.h | 39 | ||||
| -rw-r--r-- | include/net/sock.h | 25 | ||||
| -rw-r--r-- | include/net/tcp.h | 43 |
3 files changed, 82 insertions, 25 deletions
diff --git a/include/net/request_sock.h b/include/net/request_sock.h index 144c39db9898..8839133d6f6b 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -83,6 +83,45 @@ static inline struct sock *req_to_sk(struct request_sock *req) return (struct sock *)req; } +/** + * skb_steal_sock - steal a socket from an sk_buff + * @skb: sk_buff to steal the socket from + * @refcounted: is set to true if the socket is reference-counted + * @prefetched: is set to true if the socket was assigned from bpf + */ +static inline struct sock *skb_steal_sock(struct sk_buff *skb, + bool *refcounted, bool *prefetched) +{ + struct sock *sk = skb->sk; + + if (!sk) { + *prefetched = false; + *refcounted = false; + return NULL; + } + + *prefetched = skb_sk_is_prefetched(skb); + if (*prefetched) { +#if IS_ENABLED(CONFIG_SYN_COOKIES) + if (sk->sk_state == TCP_NEW_SYN_RECV && inet_reqsk(sk)->syncookie) { + struct request_sock *req = inet_reqsk(sk); + + *refcounted = false; + sk = req->rsk_listener; + req->rsk_listener = NULL; + return sk; + } +#endif + *refcounted = sk_is_refcounted(sk); + } else { + *refcounted = true; + } + + skb->destructor = NULL; + skb->sk = NULL; + return sk; +} + static inline struct request_sock * reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener, bool attach_listener) diff --git a/include/net/sock.h b/include/net/sock.h index a7f815c7cfdf..32a399fdcbb5 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2814,31 +2814,6 @@ sk_is_refcounted(struct sock *sk) return !sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE); } -/** - * skb_steal_sock - steal a socket from an sk_buff - * @skb: sk_buff to steal the socket from - * @refcounted: is set to true if the socket is reference-counted - * @prefetched: is set to true if the socket was assigned from bpf - */ -static inline struct sock * -skb_steal_sock(struct sk_buff *skb, bool *refcounted, bool *prefetched) -{ - if (skb->sk) { - struct sock *sk = skb->sk; - - *refcounted = true; - *prefetched = skb_sk_is_prefetched(skb); - if (*prefetched) - *refcounted = sk_is_refcounted(sk); - skb->destructor = NULL; - skb->sk = NULL; - return sk; - } - *prefetched = false; - *refcounted = false; - return NULL; -} - /* Checks if this SKB belongs to an HW offloaded socket * and whether any SW fallbacks are required based on dev. * Check decrypted mark in case skb_orphan() cleared socket. diff --git a/include/net/tcp.h b/include/net/tcp.h index dd78a1181031..451dc1373970 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -577,6 +577,15 @@ static inline u32 tcp_cookie_time(void) return val; } +/* Convert one nsec 64bit timestamp to ts (ms or usec resolution) */ +static inline u64 tcp_ns_to_ts(bool usec_ts, u64 val) +{ + if (usec_ts) + return div_u64(val, NSEC_PER_USEC); + + return div_u64(val, NSEC_PER_MSEC); +} + u32 __cookie_v4_init_sequence(const struct iphdr *iph, const struct tcphdr *th, u16 *mssp); __u32 cookie_v4_init_sequence(const struct sk_buff *skb, __u16 *mss); @@ -590,6 +599,40 @@ static inline bool cookie_ecn_ok(const struct net *net, const struct dst_entry * dst_feature(dst, RTAX_FEATURE_ECN); } +#if IS_ENABLED(CONFIG_BPF) +struct bpf_tcp_req_attrs { + u32 rcv_tsval; + u32 rcv_tsecr; + u16 mss; + u8 rcv_wscale; + u8 snd_wscale; + u8 ecn_ok; + u8 wscale_ok; + u8 sack_ok; + u8 tstamp_ok; + u8 usec_ts_ok; + u8 reserved[3]; +}; + +static inline bool cookie_bpf_ok(struct sk_buff *skb) +{ + return skb->sk; +} + +struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb); +#else +static inline bool cookie_bpf_ok(struct sk_buff *skb) +{ + return false; +} + +static inline struct request_sock *cookie_bpf_check(struct net *net, struct sock *sk, + struct sk_buff *skb) +{ + return NULL; +} +#endif + /* From net/ipv6/syncookies.c */ int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th); struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb); |
