diff options
| author | Jakub Kicinski <kuba@kernel.org> | 2022-06-10 16:21:39 -0700 |
|---|---|---|
| committer | Jakub Kicinski <kuba@kernel.org> | 2022-06-10 16:21:40 -0700 |
| commit | e10b02ee5b6c95872064cf0a8e65f31951a31967 (patch) | |
| tree | e061107c999e33aac6a61f87cd45a24cd4258422 /net/ipv4/tcp_timer.c | |
| parent | 5c281b4e529cd5a73b32ac561d79f448d18dda6f (diff) | |
| parent | 0f2c2693988aeeb4c83a581fe58a28d526eecd39 (diff) | |
Merge branch 'net-reduce-tcp_memory_allocated-inflation'
Eric Dumazet says:
====================
net: reduce tcp_memory_allocated inflation
Hosts with a lot of sockets tend to hit so called TCP memory pressure,
leading to very bad TCP performance and/or OOM.
The problem is that some TCP sockets can hold up to 2MB of 'forward
allocations' in their per-socket cache (sk->sk_forward_alloc),
and there is no mechanism to make them relinquish their share
under mem pressure.
Only under some potentially rare events their share is reclaimed,
one socket at a time.
In this series, I implemented a per-cpu cache instead of a per-socket one.
Each CPU has a +1/-1 MB (256 pages on x86) forward alloc cache, in order
to not dirty tcp_memory_allocated shared cache line too often.
We keep sk->sk_forward_alloc values as small as possible, to meet
memcg page granularity constraint.
Note that memcg already has a per-cpu cache, although MEMCG_CHARGE_BATCH
is defined to 32 pages, which seems a bit small.
Note that while this cover letter mentions TCP, this work is generic
and supports TCP, UDP, DECNET, SCTP.
====================
Link: https://lore.kernel.org/r/20220609063412.2205738-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Diffstat (limited to 'net/ipv4/tcp_timer.c')
| -rw-r--r-- | net/ipv4/tcp_timer.c | 19 |
1 files changed, 4 insertions, 15 deletions
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 20cf4a98c69d..2208755e8efc 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -290,15 +290,13 @@ void tcp_delack_timer_handler(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); - sk_mem_reclaim_partial(sk); - if (((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) || !(icsk->icsk_ack.pending & ICSK_ACK_TIMER)) - goto out; + return; if (time_after(icsk->icsk_ack.timeout, jiffies)) { sk_reset_timer(sk, &icsk->icsk_delack_timer, icsk->icsk_ack.timeout); - goto out; + return; } icsk->icsk_ack.pending &= ~ICSK_ACK_TIMER; @@ -317,10 +315,6 @@ void tcp_delack_timer_handler(struct sock *sk) tcp_send_ack(sk); __NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKS); } - -out: - if (tcp_under_memory_pressure(sk)) - sk_mem_reclaim(sk); } @@ -600,11 +594,11 @@ void tcp_write_timer_handler(struct sock *sk) if (((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) || !icsk->icsk_pending) - goto out; + return; if (time_after(icsk->icsk_timeout, jiffies)) { sk_reset_timer(sk, &icsk->icsk_retransmit_timer, icsk->icsk_timeout); - goto out; + return; } tcp_mstamp_refresh(tcp_sk(sk)); @@ -626,9 +620,6 @@ void tcp_write_timer_handler(struct sock *sk) tcp_probe_timer(sk); break; } - -out: - sk_mem_reclaim(sk); } static void tcp_write_timer(struct timer_list *t) @@ -743,8 +734,6 @@ static void tcp_keepalive_timer (struct timer_list *t) elapsed = keepalive_time_when(tp) - elapsed; } - sk_mem_reclaim(sk); - resched: inet_csk_reset_keepalive_timer (sk, elapsed); goto out; |
