<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/net/ip_fib.h, branch v4.14.117</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.117</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.117'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2019-01-31T07:13:42Z</updated>
<entry>
<title>net: ipv4: Fix memory leak in network namespace dismantle</title>
<updated>2019-01-31T07:13:42Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@mellanox.com</email>
</author>
<published>2019-01-09T09:57:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0781b0f9b5ea0f7dff67e9950f3fcaa3fc6e67c4'/>
<id>urn:sha1:0781b0f9b5ea0f7dff67e9950f3fcaa3fc6e67c4</id>
<content type='text'>
[ Upstream commit f97f4dd8b3bb9d0993d2491e0f22024c68109184 ]

IPv4 routing tables are flushed in two cases:

1. In response to events in the netdev and inetaddr notification chains
2. When a network namespace is being dismantled

In both cases only routes associated with a dead nexthop group are
flushed. However, a nexthop group will only be marked as dead in case it
is populated with actual nexthops using a nexthop device. This is not
the case when the route in question is an error route (e.g.,
'blackhole', 'unreachable').

Therefore, when a network namespace is being dismantled such routes are
not flushed and leaked [1].

To reproduce:
# ip netns add blue
# ip -n blue route add unreachable 192.0.2.0/24
# ip netns del blue

Fix this by not skipping error routes that are not marked with
RTNH_F_DEAD when flushing the routing tables.

To prevent the flushing of such routes in case #1, add a parameter to
fib_table_flush() that indicates if the table is flushed as part of
namespace dismantle or not.

Note that this problem does not exist in IPv6 since error routes are
associated with the loopback device.

[1]
unreferenced object 0xffff888066650338 (size 56):
  comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff  ..........ba....
    e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00  ...d............
  backtrace:
    [&lt;00000000856ed27d&gt;] inet_rtm_newroute+0x129/0x220
    [&lt;00000000fcdfc00a&gt;] rtnetlink_rcv_msg+0x397/0xa20
    [&lt;00000000cb85801a&gt;] netlink_rcv_skb+0x132/0x380
    [&lt;00000000ebc991d2&gt;] netlink_unicast+0x4c0/0x690
    [&lt;0000000014f62875&gt;] netlink_sendmsg+0x929/0xe10
    [&lt;00000000bac9d967&gt;] sock_sendmsg+0xc8/0x110
    [&lt;00000000223e6485&gt;] ___sys_sendmsg+0x77a/0x8f0
    [&lt;000000002e94f880&gt;] __sys_sendmsg+0xf7/0x250
    [&lt;00000000ccb1fa72&gt;] do_syscall_64+0x14d/0x610
    [&lt;00000000ffbe3dae&gt;] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [&lt;000000003a8b605b&gt;] 0xffffffffffffffff
unreferenced object 0xffff888061621c88 (size 48):
  comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
  hex dump (first 32 bytes):
    6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
    6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff  kkkkkkkk..&amp;_....
  backtrace:
    [&lt;00000000733609e3&gt;] fib_table_insert+0x978/0x1500
    [&lt;00000000856ed27d&gt;] inet_rtm_newroute+0x129/0x220
    [&lt;00000000fcdfc00a&gt;] rtnetlink_rcv_msg+0x397/0xa20
    [&lt;00000000cb85801a&gt;] netlink_rcv_skb+0x132/0x380
    [&lt;00000000ebc991d2&gt;] netlink_unicast+0x4c0/0x690
    [&lt;0000000014f62875&gt;] netlink_sendmsg+0x929/0xe10
    [&lt;00000000bac9d967&gt;] sock_sendmsg+0xc8/0x110
    [&lt;00000000223e6485&gt;] ___sys_sendmsg+0x77a/0x8f0
    [&lt;000000002e94f880&gt;] __sys_sendmsg+0xf7/0x250
    [&lt;00000000ccb1fa72&gt;] do_syscall_64+0x14d/0x610
    [&lt;00000000ffbe3dae&gt;] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [&lt;000000003a8b605b&gt;] 0xffffffffffffffff

Fixes: 8cced9eff1d4 ("[NETNS]: Enable routing configuration in non-initial namespace.")
Signed-off-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Reviewed-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: ipv4: update fnhe_pmtu when first hop's MTU changes</title>
<updated>2018-10-18T07:16:17Z</updated>
<author>
<name>Sabrina Dubroca</name>
<email>sd@queasysnail.net</email>
</author>
<published>2018-10-09T15:48:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9b4869cf385aa16f89c0f019eed4ec4e36aa441c'/>
<id>urn:sha1:9b4869cf385aa16f89c0f019eed4ec4e36aa441c</id>
<content type='text'>
[ Upstream commit af7d6cce53694a88d6a1bb60c9a239a6a5144459 ]

Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop
exceptions"), exceptions get deprecated separately from cached
routes. In particular, administrative changes don't clear PMTU anymore.

As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes
on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
the local MTU change can become stale:
 - if the local MTU is now lower than the PMTU, that PMTU is now
   incorrect
 - if the local MTU was the lowest value in the path, and is increased,
   we might discover a higher PMTU

Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those
cases.

If the exception was locked, the discovered PMTU was smaller than the
minimal accepted PMTU. In that case, if the new local MTU is smaller
than the current PMTU, let PMTU discovery figure out if locking of the
exception is still needed.

To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
notifier. By the time the notifier is called, dev-&gt;mtu has been
changed. This patch adds the old MTU as additional information in the
notifier structure, and a new call_netdevice_notifiers_u32() function.

Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions")
Signed-off-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Reviewed-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Reviewed-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ipv4: lock mtu in fnhe when received PMTU &lt; net.ipv4.route.min_pmtu</title>
<updated>2018-05-30T05:52:14Z</updated>
<author>
<name>Sabrina Dubroca</name>
<email>sd@queasysnail.net</email>
</author>
<published>2018-03-14T09:21:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8387fbac8e18e26a60559adc63e0b7067303b0a4'/>
<id>urn:sha1:8387fbac8e18e26a60559adc63e0b7067303b0a4</id>
<content type='text'>
[ Upstream commit d52e5a7e7ca49457dd31fc8b42fb7c0d58a31221 ]

Prior to the rework of PMTU information storage in commit
2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer."),
when a PMTU event advertising a PMTU smaller than
net.ipv4.route.min_pmtu was received, we would disable setting the DF
flag on packets by locking the MTU metric, and set the PMTU to
net.ipv4.route.min_pmtu.

Since then, we don't disable DF, and set PMTU to
net.ipv4.route.min_pmtu, so the intermediate router that has this link
with a small MTU will have to drop the packets.

This patch reestablishes pre-2.6.39 behavior by splitting
rtable-&gt;rt_pmtu into a bitfield with rt_mtu_locked and rt_pmtu.
rt_mtu_locked indicates that we shouldn't set the DF bit on that path,
and is checked in ip_dont_fragment().

One possible workaround is to set net.ipv4.route.min_pmtu to a value low
enough to accommodate the lowest MTU encountered.

Fixes: 2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer.")
Signed-off-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Reviewed-by: Stefano Brivio &lt;sbrivio@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: fib_rules: Implement notification logic in core</title>
<updated>2017-08-03T22:35:59Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@mellanox.com</email>
</author>
<published>2017-08-03T11:28:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1b2a4440858857f2f93bb2ec5bb3a60f4fcc25be'/>
<id>urn:sha1:1b2a4440858857f2f93bb2ec5bb3a60f4fcc25be</id>
<content type='text'>
Unlike the routing tables, the FIB rules share a common core, so instead
of replicating the same logic for each address family we can simply dump
the rules and send notifications from the core itself.

To protect the integrity of the dump, a rules-specific sequence counter
is added for each address family and incremented whenever a rule is
added or deleted (under RTNL).

Signed-off-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: core: Make the FIB notification chain generic</title>
<updated>2017-08-03T22:35:59Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@mellanox.com</email>
</author>
<published>2017-08-03T11:28:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=04b1d4e50e82536c12da00ee04a77510c459c844'/>
<id>urn:sha1:04b1d4e50e82536c12da00ee04a77510c459c844</id>
<content type='text'>
The FIB notification chain is currently soley used by IPv4 code.
However, we're going to introduce IPv6 FIB offload support, which
requires these notification as well.

As explained in commit c3852ef7f2f8 ("ipv4: fib: Replay events when
registering FIB notifier"), upon registration to the chain, the callee
receives a full dump of the FIB tables and rules by traversing all the
net namespaces. The integrity of the dump is ensured by a per-namespace
sequence counter that is incremented whenever a change to the tables or
rules occurs.

In order to allow more address families to use the chain, each family is
expected to register its fib_notifier_ops in its pernet init. These
operations allow the common code to read the family's sequence counter
as well as dump its tables and rules in the given net namespace.

Additionally, a 'family' parameter is added to sent notifications, so
that listeners could distinguish between the different families.

Implement the common code that allows listeners to register to the chain
and for address families to register their fib_notifier_ops. Subsequent
patches will implement these operations in IPv6.

In the future, ipmr and ip6mr will be extended to provide these
notifications as well.

Signed-off-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: fib: Remove unused functions</title>
<updated>2017-08-03T00:00:24Z</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@mellanox.com</email>
</author>
<published>2017-08-02T07:56:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2202e35d47fff379fc744e6bd3c111b018cc77df'/>
<id>urn:sha1:2202e35d47fff379fc744e6bd3c111b018cc77df</id>
<content type='text'>
Previous patches converted users of these functions to provide offload
indication using the nexthop's flags instead of the FIB info's.

Signed-off-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Acked-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net, ipv4: convert fib_info.fib_clntref from atomic_t to refcount_t</title>
<updated>2017-07-04T08:29:04Z</updated>
<author>
<name>Reshetova, Elena</name>
<email>elena.reshetova@intel.com</email>
</author>
<published>2017-07-04T06:35:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0029c0deb590bc18732327d53b8fdab12eec6d7c'/>
<id>urn:sha1:0029c0deb590bc18732327d53b8fdab12eec6d7c</id>
<content type='text'>
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova &lt;elena.reshetova@intel.com&gt;
Signed-off-by: Hans Liljestrand &lt;ishkamiel@gmail.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: David Windsor &lt;dwindsor@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: ipv4: Add extack message for invalid prefix or length</title>
<updated>2017-05-30T15:55:31Z</updated>
<author>
<name>David Ahern</name>
<email>dsahern@gmail.com</email>
</author>
<published>2017-05-27T22:19:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=78055998954b7a3e6c31eb24d1d26f0b63a7ec0d'/>
<id>urn:sha1:78055998954b7a3e6c31eb24d1d26f0b63a7ec0d</id>
<content type='text'>
Add extack error message for invalid prefix length and invalid prefix.
Example of the latter is a route spec containing 172.16.100.1/24, where
the /24 mask means the lower 8-bits should be 0. Amazing how easy that
one is to overlook when an EINVAL is returned.

Signed-off-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2017-05-27T00:46:35Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2017-05-27T00:46:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=34aa83c2fc23e055968387c8b78ac8bafd735aff'/>
<id>urn:sha1:34aa83c2fc23e055968387c8b78ac8bafd735aff</id>
<content type='text'>
Overlapping changes in drivers/net/phy/marvell.c, bug fix in 'net'
restricting a HW workaround alongside cleanups in 'net-next'.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv4: add reference counting to metrics</title>
<updated>2017-05-26T18:57:07Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2017-05-25T21:27:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3fb07daff8e99243366a081e5129560734de4ada'/>
<id>urn:sha1:3fb07daff8e99243366a081e5129560734de4ada</id>
<content type='text'>
Andrey Konovalov reported crashes in ipv4_mtu()

I could reproduce the issue with KASAN kernels, between
10.246.7.151 and 10.246.7.152 :

1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &amp;

2) At the same time run following loop :
while :
do
 ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
 ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
done

Cong Wang attempted to add back rt-&gt;fi in commit
82486aa6f1b9 ("ipv4: restore rt-&gt;fi for reference counting")
but this proved to add some issues that were complex to solve.

Instead, I suggested to add a refcount to the metrics themselves,
being a standalone object (in particular, no reference to other objects)

I tried to make this patch as small as possible to ease its backport,
instead of being super clean. Note that we believe that only ipv4 dst
need to take care of the metric refcount. But if this is wrong,
this patch adds the basic infrastructure to extend this to other
families.

Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
for his efforts on this problem.

Fixes: 2860583fe840 ("ipv4: Kill rt-&gt;fi")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Reviewed-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Acked-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
