<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/net/core/dev.c, branch v3.12.13</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.12.13</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.12.13'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2013-12-08T15:29:14Z</updated>
<entry>
<title>net: core: Always propagate flag changes to interfaces</title>
<updated>2013-12-08T15:29:14Z</updated>
<author>
<name>Vlad Yasevich</name>
<email>vyasevic@redhat.com</email>
</author>
<published>2013-11-20T01:47:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a6ec437be3d70e9ba85fb34941a96577dcbb8414'/>
<id>urn:sha1:a6ec437be3d70e9ba85fb34941a96577dcbb8414</id>
<content type='text'>
[ Upstream commit d2615bf450694c1302d86b9cc8a8958edfe4c3a4 ]

The following commit:
    b6c40d68ff6498b7f63ddf97cf0aa818d748dee7
    net: only invoke dev-&gt;change_rx_flags when device is UP

tried to fix a problem with VLAN devices and promiscuouse flag setting.
The issue was that VLAN device was setting a flag on an interface that
was down, thus resulting in bad promiscuity count.
This commit blocked flag propagation to any device that is currently
down.

A later commit:
    deede2fabe24e00bd7e246eb81cd5767dc6fcfc7
    vlan: Don't propagate flag changes on down interfaces

fixed VLAN code to only propagate flags when the VLAN interface is up,
thus fixing the same issue as above, only localized to VLAN.

The problem we have now is that if we have create a complex stack
involving multiple software devices like bridges, bonds, and vlans,
then it is possible that the flags would not propagate properly to
the physical devices.  A simple examle of the scenario is the
following:

  eth0----&gt; bond0 ----&gt; bridge0 ---&gt; vlan50

If bond0 or eth0 happen to be down at the time bond0 is added to
the bridge, then eth0 will never have promisc mode set which is
currently required for operation as part of the bridge.  As a
result, packets with vlan50 will be dropped by the interface.

The only 2 devices that implement the special flag handling are
VLAN and DSA and they both have required code to prevent incorrect
flag propagation.  As a result we can remove the generic solution
introduced in b6c40d68ff6498b7f63ddf97cf0aa818d748dee7 and leave
it to the individual devices to decide whether they will block
flag propagation or not.

Reported-by: Stefan Priebe &lt;s.priebe@profihost.ag&gt;
Suggested-by: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Signed-off-by: Vlad Yasevich &lt;vyasevic@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>core/dev: do not ignore dmac in dev_forward_skb()</title>
<updated>2013-12-08T15:29:11Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2013-11-12T22:39:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c7cf12c15986f580ce4f75ebbcbb8659eb96a506'/>
<id>urn:sha1:c7cf12c15986f580ce4f75ebbcbb8659eb96a506</id>
<content type='text'>
[ Upstream commit 81b9eab5ebbf0d5d54da4fc168cfb02c2adc76b8 ]

commit 06a23fe31ca3
("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()")
and refactoring 64261f230a91
("dev: move skb_scrub_packet() after eth_type_trans()")

are forcing pkt_type to be PACKET_HOST when skb traverses veth.

which means that ip forwarding will kick in inside netns
even if skb-&gt;eth-&gt;h_dest != dev-&gt;dev_addr

Fix order of eth_type_trans() and skb_scrub_packet() in dev_forward_skb()
and in ip_tunnel_rcv()

Fixes: 06a23fe31ca3 ("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()")
CC: Isaku Yamahata &lt;yamahatanetdev@gmail.com&gt;
CC: Maciej Zenczykowski &lt;zenczykowski@gmail.com&gt;
CC: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>netif_set_xps_queue: make cpu mask const</title>
<updated>2013-10-07T16:29:26Z</updated>
<author>
<name>Michael S. Tsirkin</name>
<email>mst@redhat.com</email>
</author>
<published>2013-10-02T06:14:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3573540cafa4296dd60f8be02f2aecaa31047525'/>
<id>urn:sha1:3573540cafa4296dd60f8be02f2aecaa31047525</id>
<content type='text'>
virtio wants to pass in cpumask_of(cpu), make parameter
const to avoid build warnings.

Signed-off-by: Michael S. Tsirkin &lt;mst@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Delay default_device_exit_batch until no devices are unregistering v2</title>
<updated>2013-09-28T22:09:15Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2013-09-24T04:19:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=50624c934db18ab90aaea4908f60dd39aab4e6e5'/>
<id>urn:sha1:50624c934db18ab90aaea4908f60dd39aab4e6e5</id>
<content type='text'>
There is currently serialization network namespaces exiting and
network devices exiting as the final part of netdev_run_todo does not
happen under the rtnl_lock.  This is compounded by the fact that the
only list of devices unregistering in netdev_run_todo is local to the
netdev_run_todo.

This lack of serialization in extreme cases results in network devices
unregistering in netdev_run_todo after the loopback device of their
network namespace has been freed (making dst_ifdown unsafe), and after
the their network namespace has exited (making the NETDEV_UNREGISTER,
and NETDEV_UNREGISTER_FINAL callbacks unsafe).

Add the missing serialization by a per network namespace count of how
many network devices are unregistering and having a wait queue that is
woken up whenever the count is decreased.  The count and wait queue
allow default_device_exit_batch to wait until all of the unregistration
activity for a network namespace has finished before proceeding to
unregister the loopback device and then allowing the network namespace
to exit.

Only a single global wait queue is used because there is a single global
lock, and there is a single waiter, per network namespace wait queues
would be a waste of resources.

The per network namespace count of unregistering devices gives a
progress guarantee because the number of network devices unregistering
in an exiting network namespace must ultimately drop to zero (assuming
network device unregistration completes).

The basic logic remains the same as in v1.  This patch is now half
comment and half rtnl_lock_unregistering an expanded version of
wait_event performs no extra work in the common case where no network
devices are unregistering when we get to default_device_exit_batch.

Reported-by: Francesco Ruggeri &lt;fruggeri@aristanetworks.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: correctly interlink lower/upper devices</title>
<updated>2013-09-04T04:27:26Z</updated>
<author>
<name>Veaceslav Falico</name>
<email>vfalico@redhat.com</email>
</author>
<published>2013-09-02T14:26:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=82476b316084e6826a9dd339d1dad892a598af9a'/>
<id>urn:sha1:82476b316084e6826a9dd339d1dad892a598af9a</id>
<content type='text'>
Currently we're linking upper devices to lower ones, which results in
upside-down relationship: upper devices seeing lower devices via its upper
lists.

Fix this by correctly linking lower devices to the upper ones.

CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: Eric Dumazet &lt;edumazet@google.com&gt;
CC: Jiri Pirko &lt;jiri@resnulli.us&gt;
CC: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
CC: Cong Wang &lt;amwang@redhat.com&gt;
Signed-off-by: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>skb: allow skb_scrub_packet() to be used by tunnels</title>
<updated>2013-09-04T04:27:25Z</updated>
<author>
<name>Nicolas Dichtel</name>
<email>nicolas.dichtel@6wind.com</email>
</author>
<published>2013-09-02T13:34:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8b27f27797cac5ed9b2f3e63dac89a7ae70e70a7'/>
<id>urn:sha1:8b27f27797cac5ed9b2f3e63dac89a7ae70e70a7</id>
<content type='text'>
This function was only used when a packet was sent to another netns. Now, it can
also be used after tunnel encapsulation or decapsulation.

Only skb_orphan() should not be done when a packet is not crossing netns.

Signed-off-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: add netdev_upper_get_next_dev_rcu(dev, iter)</title>
<updated>2013-08-29T20:19:42Z</updated>
<author>
<name>Veaceslav Falico</name>
<email>vfalico@redhat.com</email>
</author>
<published>2013-08-28T21:25:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=48311f46853c0361f9fba7e0e6bb1652d633c049'/>
<id>urn:sha1:48311f46853c0361f9fba7e0e6bb1652d633c049</id>
<content type='text'>
This function returns the next dev in the dev-&gt;upper_dev_list after the
struct list_head **iter position, and updates *iter accordingly. Returns
NULL if there are no devices left.

Caller must hold RCU read lock.

CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: Eric Dumazet &lt;edumazet@google.com&gt;
CC: Jiri Pirko &lt;jiri@resnulli.us&gt;
CC: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
CC: Cong Wang &lt;amwang@redhat.com&gt;
Signed-off-by: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: remove search_list from netdev_adjacent</title>
<updated>2013-08-29T20:19:42Z</updated>
<author>
<name>Veaceslav Falico</name>
<email>vfalico@redhat.com</email>
</author>
<published>2013-08-28T21:25:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=620f3186caa8124e0efaf329751cf51c5d55c731'/>
<id>urn:sha1:620f3186caa8124e0efaf329751cf51c5d55c731</id>
<content type='text'>
We already don't need it cause we see every upper/lower device in the list
already.

CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: Eric Dumazet &lt;edumazet@google.com&gt;
CC: Jiri Pirko &lt;jiri@resnulli.us&gt;
CC: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
CC: Cong Wang &lt;amwang@redhat.com&gt;
Signed-off-by: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: add lower_dev_list to net_device and make a full mesh</title>
<updated>2013-08-29T20:19:42Z</updated>
<author>
<name>Veaceslav Falico</name>
<email>vfalico@redhat.com</email>
</author>
<published>2013-08-28T21:25:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5d261913ca3daf6c2d21d38924235667b3d07c40'/>
<id>urn:sha1:5d261913ca3daf6c2d21d38924235667b3d07c40</id>
<content type='text'>
This patch adds lower_dev_list list_head to net_device, which is the same
as upper_dev_list, only for lower devices, and begins to use it in the same
way as the upper list.

It also changes the way the whole adjacent device lists work - now they
contain *all* of upper/lower devices, not only the first level. The first
level devices are distinguished by the bool neighbour field in
netdev_adjacent, also added by this patch.

There are cases when a device can be added several times to the adjacent
list, the simplest would be:

     /---- eth0.10 ---\
eth0-		       --- bond0
     \---- eth0.20 ---/

where both bond0 and eth0 'see' each other in the adjacent lists two times.
To avoid duplication of netdev_adjacent structures ref_nr is being kept as
the number of times the device was added to the list.

The 'full view' is achieved by adding, on link creation, all of the
upper_dev's upper_dev_list devices as upper devices to all of the
lower_dev's lower_dev_list devices (and to the lower_dev itself), and vice
versa. On unlink they are removed using the same logic.

I've tested it with thousands vlans/bonds/bridges, everything works ok and
no observable lags even on a huge number of interfaces.

Memory footprint for 128 devices interconnected with each other via both
upper and lower (which is impossible, but for the comparison) lists would be:

128*128*2*sizeof(netdev_adjacent) = 1.5MB

but in the real world we usualy have at most several devices with slaves
and a lot of vlans, so the footprint will be much lower.

CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: Eric Dumazet &lt;edumazet@google.com&gt;
CC: Jiri Pirko &lt;jiri@resnulli.us&gt;
CC: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
CC: Cong Wang &lt;amwang@redhat.com&gt;
Signed-off-by: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: rename netdev_upper to netdev_adjacent</title>
<updated>2013-08-29T20:19:41Z</updated>
<author>
<name>Veaceslav Falico</name>
<email>vfalico@redhat.com</email>
</author>
<published>2013-08-28T21:25:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aa9d85605f5ab070b64842b3eba797cf81698ae1'/>
<id>urn:sha1:aa9d85605f5ab070b64842b3eba797cf81698ae1</id>
<content type='text'>
Rename the structure to reflect the upcoming addition of lower_dev_list.

CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: Eric Dumazet &lt;edumazet@google.com&gt;
CC: Jiri Pirko &lt;jiri@resnulli.us&gt;
CC: Alexander Duyck &lt;alexander.h.duyck@intel.com&gt;
CC: Cong Wang &lt;amwang@redhat.com&gt;
Signed-off-by: Veaceslav Falico &lt;vfalico@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
