<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/net, branch v4.14.7</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.7</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.7'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-12-17T14:07:58Z</updated>
<entry>
<title>net: accept UFO datagrams from tuntap and packet</title>
<updated>2017-12-17T14:07:58Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2017-11-21T15:22:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=60335608e2f1ad6e55acaa4f13bd0f3bcd156dbd'/>
<id>urn:sha1:60335608e2f1ad6e55acaa4f13bd0f3bcd156dbd</id>
<content type='text'>
[ Upstream commit 0c19f846d582af919db66a5914a0189f9f92c936 ]

Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.

Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.

It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
("net: avoid skb_warn_bad_offload false positives on UFO").

(*) To avoid having to bring back skb_shinfo(skb)-&gt;ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.

Tested
  Booted a v4.13 guest kernel with QEMU. On a host kernel before this
  patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
  enabled, same as on a v4.13 host kernel.

  A UFO packet sent from the guest appears on the tap device:
    host:
      nc -l -p -u 8000 &amp;
      tcpdump -n -i tap0

    guest:
      dd if=/dev/zero of=payload.txt bs=1 count=2000
      nc -u 192.16.1.1 8000 &lt; payload.txt

  Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
  packets arriving fragmented:

    ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
    (from https://github.com/wdebruij/kerneltools/tree/master/tests)

Changes
  v1 -&gt; v2
    - simplified set_offload change (review comment)
    - documented test procedure

Link: http://lkml.kernel.org/r/&lt;CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com&gt;
Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
Reported-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Jason Wang &lt;jasowang@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: when scheduling TLP, time of RTO should account for current ACK</title>
<updated>2017-12-17T14:07:58Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2017-11-18T02:06:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=241eb29c019a0b4e2a3ff5ca4b4449374aeb5f87'/>
<id>urn:sha1:241eb29c019a0b4e2a3ff5ca4b4449374aeb5f87</id>
<content type='text'>
[ Upstream commit ed66dfaf236c04d414de1d218441296e57fb2bd2 ]

Fix the TLP scheduling logic so that when scheduling a TLP probe, we
ensure that the estimated time at which an RTO would fire accounts for
the fact that ACKs indicating forward progress should push back RTO
times.

After the following fix:

df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")

we had an unintentional behavior change in the following kind of
scenario: suppose the RTT variance has been very low recently. Then
suppose we send out a flight of N packets and our RTT is 100ms:

t=0: send a flight of N packets
t=100ms: receive an ACK for N-1 packets

The response before df92c8394e6e that was:
  -&gt; schedule a TLP for now + RTO_interval

The response after df92c8394e6e is:
  -&gt; schedule a TLP for t=0 + RTO_interval

Since RTO_interval = srtt + RTT_variance, this means that we have
scheduled a TLP timer at a point in the future that only accounts for
RTT_variance. If the RTT_variance term is small, this means that the
timer fires soon.

Before df92c8394e6e this would not happen, because in that code, when
we receive an ACK for a prefix of flight, we did:

    1) Near the top of tcp_ack(), switch from TLP timer to RTO
       at write_queue_head-&gt;paket_tx_time + RTO_interval:
            if (icsk-&gt;icsk_pending == ICSK_TIME_LOSS_PROBE)
                   tcp_rearm_rto(sk);

    2) In tcp_clean_rtx_queue(), update the RTO to now + RTO_interval:
            if (flag &amp; FLAG_ACKED) {
                   tcp_rearm_rto(sk);

    3) In tcp_ack() after tcp_fastretrans_alert() switch from RTO
       to TLP at now + RTO_interval:
            if (icsk-&gt;icsk_pending == ICSK_TIME_RETRANS)
                   tcp_schedule_loss_probe(sk);

In df92c8394e6e we removed that 3-phase dance, and instead directly
set the TLP timer once: we set the TLP timer in cases like this to
write_queue_head-&gt;packet_tx_time + RTO_interval. So if the RTT
variance is small, then this means that this is setting the TLP timer
to fire quite soon. This means if the ACK for the tail of the flight
takes longer than an RTT to arrive (often due to delayed ACKs), then
the TLP timer fires too quickly.

Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match()</title>
<updated>2017-12-17T14:07:57Z</updated>
<author>
<name>David Ahern</name>
<email>dsahern@gmail.com</email>
</author>
<published>2017-12-03T17:33:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=efb7dc5a9658de725de8d0b2016dbe198406d93e'/>
<id>urn:sha1:efb7dc5a9658de725de8d0b2016dbe198406d93e</id>
<content type='text'>
[ Usptream commit b4d1605a8ea608fd7dc45b926a05d75d340bde4b ]

After this fix : ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()"),
socket lookups happen while skb-&gt;cb[] has not been mangled yet by TCP.

Fixes: a04a480d4392 ("net: Require exact match for TCP socket lookups if dif is l3mdev")
Signed-off-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: remove hlist_nulls_add_tail_rcu()</title>
<updated>2017-12-17T14:07:57Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2017-12-05T20:45:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6878a06356a36d69b18916d2f0c483f21e0885d5'/>
<id>urn:sha1:6878a06356a36d69b18916d2f0c483f21e0885d5</id>
<content type='text'>
[ Upstream commit d7efc6c11b277d9d80b99b1334a78bfe7d7edf10 ]

Alexander Potapenko reported use of uninitialized memory [1]

This happens when inserting a request socket into TCP ehash,
in __sk_nulls_add_node_rcu(), since sk_reuseport is not initialized.

Bug was added by commit d894ba18d4e4 ("soreuseport: fix ordering for
mixed v4/v6 sockets")

Note that d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6
ordering fix") missed the opportunity to get rid of
hlist_nulls_add_tail_rcu() :

Both UDP sockets and TCP/DCCP listeners no longer use
__sk_nulls_add_node_rcu() for their hash insertion.

Since all other sockets have unique 4-tuple, the reuseport status
has no special meaning, so we can always use hlist_nulls_add_head_rcu()
for them and save few cycles/instructions.

[1]

==================================================================
BUG: KMSAN: use of uninitialized memory in inet_ehash_insert+0xd40/0x1050
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3288
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 &lt;IRQ&gt;
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x185/0x1d0 lib/dump_stack.c:52
 kmsan_report+0x13f/0x1c0 mm/kmsan/kmsan.c:1016
 __msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:766
 __sk_nulls_add_node_rcu ./include/net/sock.h:684
 inet_ehash_insert+0xd40/0x1050 net/ipv4/inet_hashtables.c:413
 reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:754
 inet_csk_reqsk_queue_hash_add+0x1cc/0x300 net/ipv4/inet_connection_sock.c:765
 tcp_conn_request+0x31e7/0x36f0 net/ipv4/tcp_input.c:6414
 tcp_v4_conn_request+0x16d/0x220 net/ipv4/tcp_ipv4.c:1314
 tcp_rcv_state_process+0x42a/0x7210 net/ipv4/tcp_input.c:5917
 tcp_v4_do_rcv+0xa6a/0xcd0 net/ipv4/tcp_ipv4.c:1483
 tcp_v4_rcv+0x3de0/0x4ab0 net/ipv4/tcp_ipv4.c:1763
 ip_local_deliver_finish+0x6bb/0xcb0 net/ipv4/ip_input.c:216
 NF_HOOK ./include/linux/netfilter.h:248
 ip_local_deliver+0x3fa/0x480 net/ipv4/ip_input.c:257
 dst_input ./include/net/dst.h:477
 ip_rcv_finish+0x6fb/0x1540 net/ipv4/ip_input.c:397
 NF_HOOK ./include/linux/netfilter.h:248
 ip_rcv+0x10f6/0x15c0 net/ipv4/ip_input.c:488
 __netif_receive_skb_core+0x36f6/0x3f60 net/core/dev.c:4298
 __netif_receive_skb net/core/dev.c:4336
 netif_receive_skb_internal+0x63c/0x19c0 net/core/dev.c:4497
 napi_skb_finish net/core/dev.c:4858
 napi_gro_receive+0x629/0xa50 net/core/dev.c:4889
 e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4018
 e1000_clean_rx_irq+0x1492/0x1d30
drivers/net/ethernet/intel/e1000/e1000_main.c:4474
 e1000_clean+0x43aa/0x5970 drivers/net/ethernet/intel/e1000/e1000_main.c:3819
 napi_poll net/core/dev.c:5500
 net_rx_action+0x73c/0x1820 net/core/dev.c:5566
 __do_softirq+0x4b4/0x8dd kernel/softirq.c:284
 invoke_softirq kernel/softirq.c:364
 irq_exit+0x203/0x240 kernel/softirq.c:405
 exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:638
 do_IRQ+0x15e/0x1a0 arch/x86/kernel/irq.c:263
 common_interrupt+0x86/0x86

Fixes: d894ba18d4e4 ("soreuseport: fix ordering for mixed v4/v6 sockets")
Fixes: d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Alexander Potapenko &lt;glider@google.com&gt;
Acked-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>uapi: fix linux/tls.h userspace compilation error</title>
<updated>2017-11-30T08:40:40Z</updated>
<author>
<name>Dmitry V. Levin</name>
<email>ldv@altlinux.org</email>
</author>
<published>2017-11-14T03:30:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=33e58deefa0907ffba4d5a4c69f3641a649358d3'/>
<id>urn:sha1:33e58deefa0907ffba4d5a4c69f3641a649358d3</id>
<content type='text'>
commit b9f3eb499d84f8d4adcb2f9212ec655700b28228 upstream.

Move inclusion of a private kernel header &lt;net/tcp.h&gt;
from uapi/linux/tls.h to its only user - net/tls.h,
to fix the following linux/tls.h userspace compilation error:

/usr/include/linux/tls.h:41:21: fatal error: net/tcp.h: No such file or directory

As to this point uapi/linux/tls.h was totaly unusuable for userspace,
cleanup this header file further by moving other redundant includes
to net/tls.h.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Dmitry V. Levin &lt;ldv@altlinux.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>net_sched: introduce tcf_exts_get_net() and tcf_exts_put_net()</title>
<updated>2017-11-09T01:03:09Z</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2017-11-06T21:47:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e4b95c41df36befcfd117210900cd790bc2cd048'/>
<id>urn:sha1:e4b95c41df36befcfd117210900cd790bc2cd048</id>
<content type='text'>
Instead of holding netns refcnt in tc actions, we can minimize
the holding time by saving it in struct tcf_exts instead. This
means we can just hold netns refcnt right before call_rcu() and
release it after tcf_exts_destroy() is done.

However, because on netns cleanup path we call tcf_proto_destroy()
too, obviously we can not hold netns for a zero refcnt, in this
case we have to do cleanup synchronously. It is fine for RCU too,
the caller cleanup_net() already waits for a grace period.

For other cases, refcnt is non-zero and we can safely grab it as
normal and release it after we are done.

This patch provides two new API for each filter to use:
tcf_exts_get_net() and tcf_exts_put_net(). And all filters now can
use the following pattern:

void __destroy_filter() {
  tcf_exts_destroy();
  tcf_exts_put_net();  // &lt;== release netns refcnt
  kfree();
}
void some_work() {
  rtnl_lock();
  __destroy_filter();
  rtnl_unlock();
}
void some_rcu_callback() {
  tcf_queue_work(some_work);
}

if (tcf_exts_get_net())  // &lt;== hold netns refcnt
  call_rcu(some_rcu_callback);
else
  __destroy_filter();

Cc: Lucas Bates &lt;lucasb@mojatatu.com&gt;
Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: Jiri Pirko &lt;jiri@resnulli.us&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Revert "net_sched: hold netns refcnt for each action"</title>
<updated>2017-11-09T01:03:09Z</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2017-11-06T21:47:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c7e460ce55724d4e4e22d3126e5c47273819c53a'/>
<id>urn:sha1:c7e460ce55724d4e4e22d3126e5c47273819c53a</id>
<content type='text'>
This reverts commit ceffcc5e254b450e6159f173e4538215cebf1b59.
If we hold that refcnt, the netns can never be destroyed until
all actions are destroyed by user, this breaks our netns design
which we expect all actions are destroyed when we destroy the
whole netns.

Cc: Lucas Bates &lt;lucasb@mojatatu.com&gt;
Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: Jiri Pirko &lt;jiri@resnulli.us&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2017-11-03T16:09:21Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-11-03T16:09:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7ba3ebff9c09fa461e7c2d186b81f7af61910abe'/>
<id>urn:sha1:7ba3ebff9c09fa461e7c2d186b81f7af61910abe</id>
<content type='text'>
Pull networking fixes from David Miller:
 "Hopefully this is the last batch of networking fixes for 4.14

  Fingers crossed...

   1) Fix stmmac to use the proper sized OF property read, from Bhadram
      Varka.

   2) Fix use after free in net scheduler tc action code, from Cong
      Wang.

   3) Fix SKB control block mangling in tcp_make_synack().

   4) Use proper locking in fib_dump_info(), from Florian Westphal.

   5) Fix IPG encodings in systemport driver, from Florian Fainelli.

   6) Fix division by zero in NV TCP congestion control module, from
      Konstantin Khlebnikov.

   7) Fix use after free in nf_reject_ipv4, from Tejaswi Tanikella"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: systemport: Correct IPG length settings
  tcp: do not mangle skb-&gt;cb[] in tcp_make_synack()
  fib: fib_dump_info can no longer use __in_dev_get_rtnl
  stmmac: use of_property_read_u32 instead of read_u8
  net_sched: hold netns refcnt for each action
  net_sched: acquire RTNL in tc_action_net_exit()
  net: vrf: correct FRA_L3MDEV encode type
  tcp_nv: fix division by zero in tcpnv_acked()
  netfilter: nf_reject_ipv4: Fix use-after-free in send_reset
  netfilter: nft_set_hash: disable fast_ops for 2-len keys
</content>
</entry>
<entry>
<title>net_sched: hold netns refcnt for each action</title>
<updated>2017-11-03T01:30:38Z</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2017-11-01T17:23:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ceffcc5e254b450e6159f173e4538215cebf1b59'/>
<id>urn:sha1:ceffcc5e254b450e6159f173e4538215cebf1b59</id>
<content type='text'>
TC actions have been destroyed asynchronously for a long time,
previously in a RCU callback and now in a workqueue. If we
don't hold a refcnt for its netns, we could use the per netns
data structure, struct tcf_idrinfo, after it has been freed by
netns workqueue.

Hold refcnt to ensure netns destroy happens after all actions
are gone.

Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Reported-by: Lucas Bates &lt;lucasb@mojatatu.com&gt;
Tested-by: Lucas Bates &lt;lucasb@mojatatu.com&gt;
Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: Jiri Pirko &lt;jiri@resnulli.us&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net_sched: acquire RTNL in tc_action_net_exit()</title>
<updated>2017-11-03T01:30:38Z</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2017-11-01T17:23:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a159d3c4b8291998c018f0dbddd4678315264a1e'/>
<id>urn:sha1:a159d3c4b8291998c018f0dbddd4678315264a1e</id>
<content type='text'>
I forgot to acquire RTNL in tc_action_net_exit()
which leads that action ops-&gt;cleanup() is not always
called with RTNL. This usually is not a big deal because
this function is called after all netns refcnt are gone,
but given RTNL protects more than just actions, add it
for safety and consistency.

Also add an assertion to catch other potential bugs.

Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Reported-by: Lucas Bates &lt;lucasb@mojatatu.com&gt;
Tested-by: Lucas Bates &lt;lucasb@mojatatu.com&gt;
Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: Jiri Pirko &lt;jiri@resnulli.us&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
