<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/net, branch v3.18.67</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.18.67</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.18.67'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-08-25T00:00:09Z</updated>
<entry>
<title>netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister</title>
<updated>2017-08-25T00:00:09Z</updated>
<author>
<name>Liping Zhang</name>
<email>zlpnobody@gmail.com</email>
</author>
<published>2017-03-25T08:35:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5776984fafdc8b18fb6ddd8264e6cc274b8ef349'/>
<id>urn:sha1:5776984fafdc8b18fb6ddd8264e6cc274b8ef349</id>
<content type='text'>
commit 9c3f3794926a997b1cab6c42480ff300efa2d162 upstream.

If one cpu is doing nf_ct_extend_unregister while another cpu is doing
__nf_ct_ext_add_length, then we may hit BUG_ON(t == NULL). Moreover,
there's no synchronize_rcu invocation after set nf_ct_ext_types[id] to
NULL, so it's possible that we may access invalid pointer.

But actually, most of the ct extends are built-in, so the problem listed
above will not happen. However, there are two exceptions: NF_CT_EXT_NAT
and NF_CT_EXT_SYNPROXY.

For _EXT_NAT, the panic will not happen, since adding the nat extend and
unregistering the nat extend are located in the same file(nf_nat_core.c),
this means that after the nat module is removed, we cannot add the nat
extend too.

For _EXT_SYNPROXY, synproxy extend may be added by init_conntrack, while
synproxy extend unregister will be done by synproxy_core_exit. So after
nf_synproxy_core.ko is removed, we may still try to add the synproxy
extend, then kernel panic may happen.

I know it's very hard to reproduce this issue, but I can play a tricky
game to make it happen very easily :)

Step 1. Enable SYNPROXY for tcp dport 1234 at FORWARD hook:
  # iptables -I FORWARD -p tcp --dport 1234 -j SYNPROXY
Step 2. Queue the syn packet to the userspace at raw table OUTPUT hook.
        Also note, in the userspace we only add a 20s' delay, then
        reinject the syn packet to the kernel:
  # iptables -t raw -I OUTPUT -p tcp --syn -j NFQUEUE --queue-num 1
Step 3. Using "nc 2.2.2.2 1234" to connect the server.
Step 4. Now remove the nf_synproxy_core.ko quickly:
  # iptables -F FORWARD
  # rmmod ipt_SYNPROXY
  # rmmod nf_synproxy_core
Step 5. After 20s' delay, the syn packet is reinjected to the kernel.

Now you will see the panic like this:
  kernel BUG at net/netfilter/nf_conntrack_extend.c:91!
  Call Trace:
   ? __nf_ct_ext_add_length+0x53/0x3c0 [nf_conntrack]
   init_conntrack+0x12b/0x600 [nf_conntrack]
   nf_conntrack_in+0x4cc/0x580 [nf_conntrack]
   ipv4_conntrack_local+0x48/0x50 [nf_conntrack_ipv4]
   nf_reinject+0x104/0x270
   nfqnl_recv_verdict+0x3e1/0x5f9 [nfnetlink_queue]
   ? nfqnl_recv_verdict+0x5/0x5f9 [nfnetlink_queue]
   ? nla_parse+0xa0/0x100
   nfnetlink_rcv_msg+0x175/0x6a9 [nfnetlink]
   [...]

One possible solution is to make NF_CT_EXT_SYNPROXY extend built-in, i.e.
introduce nf_conntrack_synproxy.c and only do ct extend register and
unregister in it, similar to nf_conntrack_timeout.c.

But having such a obscure restriction of nf_ct_extend_unregister is not a
good idea, so we should invoke synchronize_rcu after set nf_ct_ext_types
to NULL, and check the NULL pointer when do __nf_ct_ext_add_length. Then
it will be easier if we add new ct extend in the future.

Last, we use kfree_rcu to free nf_ct_ext, so rcu_barrier() is unnecessary
anymore, remove it too.

Signed-off-by: Liping Zhang &lt;zlpnobody@gmail.com&gt;
Acked-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Cc: Stefan Bader &lt;stefan.bader@canonical.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output</title>
<updated>2017-08-13T02:24:19Z</updated>
<author>
<name>zheng li</name>
<email>james.z.li@ericsson.com</email>
</author>
<published>2016-12-12T01:56:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6b58ee6bb2fc8f507602656ea7072e546101337b'/>
<id>urn:sha1:6b58ee6bb2fc8f507602656ea7072e546101337b</id>
<content type='text'>
commit 0a28cfd51e17f4f0a056bcf66bfbe492c3b99f38 upstream.

There is an inconsistent conditional judgement in __ip_append_data and
ip_finish_output functions, the variable length in __ip_append_data just
include the length of application's payload and udp header, don't include
the length of ip header, but in ip_finish_output use
(skb-&gt;len &gt; ip_skb_dst_mtu(skb)) as judgement, and skb-&gt;len include the
length of ip header.

That causes some particular application's udp payload whose length is
between (MTU - IP Header) and MTU were fragmented by ip_fragment even
though the rst-&gt;dev support UFO feature.

Add the length of ip header to length in __ip_append_data to keep
consistent conditional judgement as ip_finish_output for ip fragment.

Signed-off-by: Zheng Li &lt;james.z.li@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>udp: consistently apply ufo or fragmentation</title>
<updated>2017-08-13T02:24:19Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2017-08-10T16:29:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4ac8dc208caf85675f0f745783e0a3f88dac0008'/>
<id>urn:sha1:4ac8dc208caf85675f0f745783e0a3f88dac0008</id>
<content type='text'>
[ Upstream commit 85f1bd9a7b5a79d5baa8bf44af19658f7bf77bfa ]

When iteratively building a UDP datagram with MSG_MORE and that
datagram exceeds MTU, consistently choose UFO or fragmentation.

Once skb_is_gso, always apply ufo. Conversely, once a datagram is
split across multiple skbs, do not consider ufo.

Sendpage already maintains the first invariant, only add the second.
IPv6 does not have a sendpage implementation to modify.

A gso skb must have a partial checksum, do not follow sk_no_check_tx
in udp_send_skb.

Found by syzkaller.

[gregkh - tweaks for 3.18 for ipv6, hopefully they are correct...]

Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Reported-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>revert "ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output"</title>
<updated>2017-08-13T02:24:19Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2017-08-11T16:19:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ddc3da581617b98832e5d108a95f1face998b618'/>
<id>urn:sha1:ddc3da581617b98832e5d108a95f1face998b618</id>
<content type='text'>
This reverts commit f102bb7164c9020e12662998f0fd99c3be72d4f6 which is
commit 0a28cfd51e17f4f0a056bcf66bfbe492c3b99f38 upstream as there is
another patch that needs to be applied instead of this one.

Cc: Zheng Li &lt;james.z.li@ericsson.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>packet: fix tp_reserve race in packet_set_ring</title>
<updated>2017-08-13T02:24:19Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2017-08-10T16:41:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f2ce502f866556d24ebfae84673c9ef211b79906'/>
<id>urn:sha1:f2ce502f866556d24ebfae84673c9ef211b79906</id>
<content type='text'>
[ Upstream commit c27927e372f0785f3303e8fad94b85945e2c97b7 ]

Updates to tp_reserve can race with reads of the field in
packet_set_ring. Avoid this by holding the socket lock during
updates in setsockopt PACKET_RESERVE.

This bug was discovered by syzkaller.

Fixes: 8913336a7e8d ("packet: add PACKET_RESERVE sockopt")
Reported-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: avoid skb_warn_bad_offload false positives on UFO</title>
<updated>2017-08-13T02:24:19Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2017-08-08T18:22:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=441729770537c6efe5ad862594131f02d472866b'/>
<id>urn:sha1:441729770537c6efe5ad862594131f02d472866b</id>
<content type='text'>
[ Upstream commit 8d63bee643f1fb53e472f0e135cae4eb99d62d19 ]

skb_warn_bad_offload triggers a warning when an skb enters the GSO
stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
checksum offload set.

Commit b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise")
observed that SKB_GSO_DODGY producers can trigger the check and
that passing those packets through the GSO handlers will fix it
up. But, the software UFO handler will set ip_summed to
CHECKSUM_NONE.

When __skb_gso_segment is called from the receive path, this
triggers the warning again.

Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
Tx these two are equivalent. On Rx, this better matches the
skb state (checksum computed), as CHECKSUM_NONE here means no
checksum computed.

See also this thread for context:
http://patchwork.ozlabs.org/patch/799015/

Fixes: b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise")
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: fastopen: tcp_connect() must refresh the route</title>
<updated>2017-08-13T02:24:19Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2017-08-08T08:41:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=390604b92fcdc43316e7422889591d649e46b5af'/>
<id>urn:sha1:390604b92fcdc43316e7422889591d649e46b5af</id>
<content type='text'>
[ Upstream commit 8ba60924710cde564a3905588b6219741d6356d0 ]

With new TCP_FASTOPEN_CONNECT socket option, there is a possibility
to call tcp_connect() while socket sk_dst_cache is either NULL
or invalid.

 +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
 +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
 +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
 +0 connect(4, ..., ...) = 0

&lt;&lt; sk-&gt;sk_dst_cache becomes obsolete, or even set to NULL &gt;&gt;

 +1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000

We need to refresh the route otherwise bad things can happen,
especially when syzkaller is running on the host :/

Fixes: 19f6d3f3c8422 ("net/tcp-fastopen: Add new API support")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Wei Wang &lt;weiwan@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Wei Wang &lt;weiwan@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target</title>
<updated>2017-08-13T02:24:19Z</updated>
<author>
<name>Xin Long</name>
<email>lucien.xin@gmail.com</email>
</author>
<published>2017-08-09T10:15:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=74926bfeaf9bab3f6a6bedaef5ff79d32bd38c1a'/>
<id>urn:sha1:74926bfeaf9bab3f6a6bedaef5ff79d32bd38c1a</id>
<content type='text'>
[ Upstream commit 96d9703050a0036a3360ec98bb41e107c90664fe ]

Commit 55917a21d0cc ("netfilter: x_tables: add context to know if
extension runs from nft_compat") introduced a member nft_compat to
xt_tgchk_param structure.

But it didn't set it's value for ipt_init_target. With unexpected
value in par.nft_compat, it may return unexpected result in some
target's checkentry.

This patch is to set all it's fields as 0 and only initialize the
non-zero fields in ipt_init_target.

v1-&gt;v2:
  As Wang Cong's suggestion, fix it by setting all it's fields as
  0 and only initializing the non-zero fields.

Fixes: 55917a21d0cc ("netfilter: x_tables: add context to know if extension runs from nft_compat")
Suggested-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: fix keepalive code vs TCP_FASTOPEN_CONNECT</title>
<updated>2017-08-13T02:24:18Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2017-08-03T06:10:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=52b0231eca422141324611d2e06979ea9a4755cd'/>
<id>urn:sha1:52b0231eca422141324611d2e06979ea9a4755cd</id>
<content type='text'>
[ Upstream commit 2dda640040876cd8ae646408b69eea40c24f9ae9 ]

syzkaller was able to trigger a divide by 0 in TCP stack [1]

Issue here is that keepalive timer needs to be updated to not attempt
to send a probe if the connection setup was deferred using
TCP_FASTOPEN_CONNECT socket option added in linux-4.11

[1]
 divide error: 0000 [#1] SMP
 CPU: 18 PID: 0 Comm: swapper/18 Not tainted
 task: ffff986f62f4b040 ti: ffff986f62fa2000 task.ti: ffff986f62fa2000
 RIP: 0010:[&lt;ffffffff8409cc0d&gt;]  [&lt;ffffffff8409cc0d&gt;] __tcp_select_window+0x8d/0x160
 Call Trace:
  &lt;IRQ&gt;
  [&lt;ffffffff8409d951&gt;] tcp_transmit_skb+0x11/0x20
  [&lt;ffffffff8409da21&gt;] tcp_xmit_probe_skb+0xc1/0xe0
  [&lt;ffffffff840a0ee8&gt;] tcp_write_wakeup+0x68/0x160
  [&lt;ffffffff840a151b&gt;] tcp_keepalive_timer+0x17b/0x230
  [&lt;ffffffff83b3f799&gt;] call_timer_fn+0x39/0xf0
  [&lt;ffffffff83b40797&gt;] run_timer_softirq+0x1d7/0x280
  [&lt;ffffffff83a04ddb&gt;] __do_softirq+0xcb/0x257
  [&lt;ffffffff83ae03ac&gt;] irq_exit+0x9c/0xb0
  [&lt;ffffffff83a04c1a&gt;] smp_apic_timer_interrupt+0x6a/0x80
  [&lt;ffffffff83a03eaf&gt;] apic_timer_interrupt+0x7f/0x90
  &lt;EOI&gt;
  [&lt;ffffffff83fed2ea&gt;] ? cpuidle_enter_state+0x13a/0x3b0
  [&lt;ffffffff83fed2cd&gt;] ? cpuidle_enter_state+0x11d/0x3b0

Tested:

Following packetdrill no longer crashes the kernel

`echo 0 &gt;/proc/sys/net/ipv4/tcp_timestamps`

// Cache warmup: send a Fast Open cookie request
    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
   +0 setsockopt(3, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation is now in progress)
   +0 &gt; S 0:0(0) &lt;mss 1460,nop,nop,sackOK,nop,wscale 8,FO,nop,nop&gt;
 +.01 &lt; S. 123:123(0) ack 1 win 14600 &lt;mss 1460,nop,nop,sackOK,nop,wscale 6,FO abcd1234,nop,nop&gt;
   +0 &gt; . 1:1(0) ack 1
   +0 close(3) = 0
   +0 &gt; F. 1:1(0) ack 1
   +0 &lt; F. 1:1(0) ack 2 win 92
   +0 &gt; .  2:2(0) ack 2

   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
   +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
   +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
   +0 setsockopt(4, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
 +.01 connect(4, ..., ...) = 0
   +0 setsockopt(4, SOL_TCP, TCP_KEEPIDLE, [5], 4) = 0
   +10 close(4) = 0

`echo 1 &gt;/proc/sys/net/ipv4/tcp_timestamps`

Fixes: 19f6d3f3c842 ("net/tcp-fastopen: Add new API support")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Wei Wang &lt;weiwan@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states</title>
<updated>2017-08-13T02:24:18Z</updated>
<author>
<name>Yuchung Cheng</name>
<email>ycheng@google.com</email>
</author>
<published>2017-08-01T20:22:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0864b11104a8d2978b43800e9f5168a59a23b3d5'/>
<id>urn:sha1:0864b11104a8d2978b43800e9f5168a59a23b3d5</id>
<content type='text'>
[ Upstream commit ed254971edea92c3ac5c67c6a05247a92aa6075e ]

If the sender switches the congestion control during ECN-triggered
cwnd-reduction state (CA_CWR), upon exiting recovery cwnd is set to
the ssthresh value calculated by the previous congestion control. If
the previous congestion control is BBR that always keep ssthresh
to TCP_INIFINITE_SSTHRESH, cwnd ends up being infinite. The safe
step is to avoid assigning invalid ssthresh value when recovery ends.

Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
