diff options
| author | Jakub Kicinski <kuba@kernel.org> | 2022-06-10 16:21:39 -0700 |
|---|---|---|
| committer | Jakub Kicinski <kuba@kernel.org> | 2022-06-10 16:21:40 -0700 |
| commit | e10b02ee5b6c95872064cf0a8e65f31951a31967 (patch) | |
| tree | e061107c999e33aac6a61f87cd45a24cd4258422 /net/ipv4/tcp_input.c | |
| parent | 5c281b4e529cd5a73b32ac561d79f448d18dda6f (diff) | |
| parent | 0f2c2693988aeeb4c83a581fe58a28d526eecd39 (diff) | |
Merge branch 'net-reduce-tcp_memory_allocated-inflation'
Eric Dumazet says:
====================
net: reduce tcp_memory_allocated inflation
Hosts with a lot of sockets tend to hit so called TCP memory pressure,
leading to very bad TCP performance and/or OOM.
The problem is that some TCP sockets can hold up to 2MB of 'forward
allocations' in their per-socket cache (sk->sk_forward_alloc),
and there is no mechanism to make them relinquish their share
under mem pressure.
Only under some potentially rare events their share is reclaimed,
one socket at a time.
In this series, I implemented a per-cpu cache instead of a per-socket one.
Each CPU has a +1/-1 MB (256 pages on x86) forward alloc cache, in order
to not dirty tcp_memory_allocated shared cache line too often.
We keep sk->sk_forward_alloc values as small as possible, to meet
memcg page granularity constraint.
Note that memcg already has a per-cpu cache, although MEMCG_CHARGE_BATCH
is defined to 32 pages, which seems a bit small.
Note that while this cover letter mentions TCP, this work is generic
and supports TCP, UDP, DECNET, SCTP.
====================
Link: https://lore.kernel.org/r/20220609063412.2205738-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'net/ipv4/tcp_input.c')
| -rw-r--r-- | net/ipv4/tcp_input.c | 6 |
1 files changed, 1 insertions, 5 deletions
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 2e2a9ece9af2..fdc7beb81b68 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -805,7 +805,6 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb) * restart window, so that we send ACKs quickly. */ tcp_incr_quickack(sk, TCP_MAX_QUICKACKS); - sk_mem_reclaim(sk); } } icsk->icsk_ack.lrcvtime = now; @@ -4390,7 +4389,6 @@ void tcp_fin(struct sock *sk) skb_rbtree_purge(&tp->out_of_order_queue); if (tcp_is_sack(tp)) tcp_sack_reset(&tp->rx_opt); - sk_mem_reclaim(sk); if (!sock_flag(sk, SOCK_DEAD)) { sk->sk_state_change(sk); @@ -5287,7 +5285,7 @@ new_range: before(TCP_SKB_CB(skb)->end_seq, start)) { /* Do not attempt collapsing tiny skbs */ if (range_truesize != head->truesize || - end - start >= SKB_WITH_OVERHEAD(SK_MEM_QUANTUM)) { + end - start >= SKB_WITH_OVERHEAD(PAGE_SIZE)) { tcp_collapse(sk, NULL, &tp->out_of_order_queue, head, skb, start, end); } else { @@ -5336,7 +5334,6 @@ static bool tcp_prune_ofo_queue(struct sock *sk) tcp_drop_reason(sk, rb_to_skb(node), SKB_DROP_REASON_TCP_OFO_QUEUE_PRUNE); if (!prev || goal <= 0) { - sk_mem_reclaim(sk); if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf && !tcp_under_memory_pressure(sk)) break; @@ -5383,7 +5380,6 @@ static int tcp_prune_queue(struct sock *sk) skb_peek(&sk->sk_receive_queue), NULL, tp->copied_seq, tp->rcv_nxt); - sk_mem_reclaim(sk); if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf) return 0; |
