<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/net/ip_vs.h, branch v4.1.42</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.1.42</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.1.42'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2016-07-11T03:07:16Z</updated>
<entry>
<title>ipvs: drop first packet to redirect conntrack</title>
<updated>2016-07-11T03:07:16Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2016-03-05T13:03:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7361f006e36e20a81edede6724be39e718d4347f'/>
<id>urn:sha1:7361f006e36e20a81edede6724be39e718d4347f</id>
<content type='text'>
[ Upstream commit f719e3754ee2f7275437e61a6afd520181fdd43b ]

Jiri Bohac is reporting for a problem where the attempt
to reschedule existing connection to another real server
needs proper redirect for the conntrack used by the IPVS
connection. For example, when IPVS connection is created
to NAT-ed real server we alter the reply direction of
conntrack. If we later decide to select different real
server we can not alter again the conntrack. And if we
expire the old connection, the new connection is left
without conntrack.

So, the only way to redirect both the IPVS connection and
the Netfilter's conntrack is to drop the SYN packet that
hits existing connection, to wait for the next jiffie
to expire the old connection and its conntrack and to rely
on client's retransmission to create new connection as
usually.

Jiri Bohac provided a fix that drops all SYNs on rescheduling,
I extended his patch to do such drops only for connections
that use conntrack. Here is the original report from Jiri Bohac:

Since commit dc7b3eb900aa ("ipvs: Fix reuse connection if real server
is dead"), new connections to dead servers are redistributed
immediately to new servers.  The old connection is expired using
ip_vs_conn_expire_now() which sets the connection timer to expire
immediately.

However, before the timer callback, ip_vs_conn_expire(), is run
to clean the connection's conntrack entry, the new redistributed
connection may already be established and its conntrack removed
instead.

Fix this by dropping the first packet of the new connection
instead, like we do when the destination server is not available.
The timer will have deleted the old conntrack entry long before
the first packet of the new connection is retransmitted.

Fixes: dc7b3eb900aa ("ipvs: Fix reuse connection if real server is dead")
Signed-off-by: Jiri Bohac &lt;jbohac@suse.cz&gt;
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>net: Introduce possible_net_t</title>
<updated>2015-03-12T18:39:40Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2015-03-12T04:06:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0c5c9fb55106333e773de8c9dd321fa8240caeb3'/>
<id>urn:sha1:0c5c9fb55106333e773de8c9dd321fa8240caeb3</id>
<content type='text'>
Having to say
&gt; #ifdef CONFIG_NET_NS
&gt; 	struct net *net;
&gt; #endif

in structures is a little bit wordy and a little bit error prone.

Instead it is possible to say:
&gt; typedef struct {
&gt; #ifdef CONFIG_NET_NS
&gt;       struct net *net;
&gt; #endif
&gt; } possible_net_t;

And then in a header say:

&gt; 	possible_net_t net;

Which is cleaner and easier to use and easier to test, as the
possible_net_t is always there no matter what the compile options.

Further this allows read_pnet and write_pnet to be functions in all
cases which is better at catching typos.

This change adds possible_net_t, updates the definitions of read_pnet
and write_pnet, updates optional struct net * variables that
write_pnet uses on to have the type possible_net_t, and finally fixes
up the b0rked users of read_pnet and write_pnet.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipvs: allow rescheduling of new connections when port reuse is detected</title>
<updated>2015-02-25T04:46:35Z</updated>
<author>
<name>Marcelo Ricardo Leitner</name>
<email>mleitner@redhat.com</email>
</author>
<published>2015-02-23T18:02:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d752c364571743d696c2a54a449ce77550c35ac5'/>
<id>urn:sha1:d752c364571743d696c2a54a449ce77550c35ac5</id>
<content type='text'>
Currently, when TCP/SCTP port reusing happens, IPVS will find the old
entry and use it for the new one, behaving like a forced persistence.
But if you consider a cluster with a heavy load of small connections,
such reuse will happen often and may lead to a not optimal load
balancing and might prevent a new node from getting a fair load.

This patch introduces a new sysctl, conn_reuse_mode, that allows
controlling how to proceed when port reuse is detected. The default
value will allow rescheduling of new connections only if the old entry
was in TIME_WAIT state for TCP or CLOSED for SCTP.

Signed-off-by: Marcelo Ricardo Leitner &lt;mleitner@redhat.com&gt;
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
</content>
</entry>
<entry>
<title>ipvs: use 64-bit rates in stats</title>
<updated>2015-02-09T07:59:03Z</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2015-02-06T07:44:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cd67cd5eb25ae9a7bafbfd3d52d4c05e1d80af3b'/>
<id>urn:sha1:cd67cd5eb25ae9a7bafbfd3d52d4c05e1d80af3b</id>
<content type='text'>
IPVS stats are limited to 2^(32-10) conns/s and packets/s,
2^(32-5) bytes/s. It is time to use 64 bits:

* Change all conn/packet kernel counters to 64-bit and update
them in u64_stats_update_{begin,end} section

* In kernel use struct ip_vs_kstats instead of the user-space
struct ip_vs_stats_user and use new func ip_vs_export_stats_user
to export it to sockopt users to preserve compatibility with
32-bit values

* Rename cpu counters "ustats" to "cnt"

* To netlink users provide additionally 64-bit stats:
IPVS_SVC_ATTR_STATS64 and IPVS_DEST_ATTR_STATS64. Old stats
remain for old binaries.

* We can use ip_vs_copy_stats in ip_vs_stats_percpu_show

Thanks to Chris Caputo for providing initial patch for ip_vs_est.c

Signed-off-by: Chris Caputo &lt;ccaputo@alt.net&gt;
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
</content>
</entry>
<entry>
<title>ipvs: Clean up comment style in ip_vs.h</title>
<updated>2014-10-02T16:30:58Z</updated>
<author>
<name>Simon Horman</name>
<email>horms@verge.net.au</email>
</author>
<published>2014-09-30T01:50:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=07dcc686fa8f6667dec4696804cdb43a90267b9a'/>
<id>urn:sha1:07dcc686fa8f6667dec4696804cdb43a90267b9a</id>
<content type='text'>
* Consistently use the multi-line comment style for networking code:

  /* This
   * That
   * The other thing
   */

* Use single-line comment style for comments with only one line of text.

* In general follow the leading '*' of each line of a comment with a
  single space and then text.

* Add missing line break between functions, remove double line break,
  align comments to previous lines whenever possible.

Reported-by: Sergei Shtylyov &lt;sergei.shtylyov@cogentembedded.com&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
</entry>
<entry>
<title>ipvs: prevent mixing heterogeneous pools and synchronization</title>
<updated>2014-09-16T00:03:35Z</updated>
<author>
<name>Alex Gartrell</name>
<email>agartrell@fb.com</email>
</author>
<published>2014-09-09T23:40:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=391f503d69779867f05e9296ae523e9002c2d7ee'/>
<id>urn:sha1:391f503d69779867f05e9296ae523e9002c2d7ee</id>
<content type='text'>
The synchronization protocol is not compatible with heterogeneous pools, so
we need to verify that we're not turning both on at the same time.

Signed-off-by: Alex Gartrell &lt;agartrell@fb.com&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
</content>
</entry>
<entry>
<title>ipvs: Supply destination address family to ip_vs_conn_new</title>
<updated>2014-09-16T00:03:34Z</updated>
<author>
<name>Alex Gartrell</name>
<email>agartrell@fb.com</email>
</author>
<published>2014-09-09T23:40:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ba38528aae6ee2d22226c6a78727ddc13512b068'/>
<id>urn:sha1:ba38528aae6ee2d22226c6a78727ddc13512b068</id>
<content type='text'>
The assumption that dest af is equal to service af is now unreliable, so we
must specify it manually so as not to copy just the first 4 bytes of a v6
address or doing an illegal read of 16 butes on a v6 address.

We "lie" in two places: for synchronization (which we will explicitly
disallow from happening when we have heterogeneous pools) and for black
hole addresses where there's no real dest.

Signed-off-by: Alex Gartrell &lt;agartrell@fb.com&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
</content>
</entry>
<entry>
<title>ipvs: Supply destination addr family to ip_vs_{lookup_dest,find_dest}</title>
<updated>2014-09-16T00:03:33Z</updated>
<author>
<name>Alex Gartrell</name>
<email>agartrell@fb.com</email>
</author>
<published>2014-09-09T23:40:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=655eef103d0bd99f540a52f7ede032e120756846'/>
<id>urn:sha1:655eef103d0bd99f540a52f7ede032e120756846</id>
<content type='text'>
We need to remove the assumption that virtual address family is the same as
real address family in order to support heterogeneous services (that is,
services with v4 vips and v6 backends or the opposite).

Signed-off-by: Alex Gartrell &lt;agartrell@fb.com&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
</content>
</entry>
<entry>
<title>ipvs: Add destination address family to netlink interface</title>
<updated>2014-09-16T00:03:33Z</updated>
<author>
<name>Alex Gartrell</name>
<email>agartrell@fb.com</email>
</author>
<published>2014-09-09T23:40:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6cff339bbd5f9eda7a5e8a521f91a88d046e6d0c'/>
<id>urn:sha1:6cff339bbd5f9eda7a5e8a521f91a88d046e6d0c</id>
<content type='text'>
This is necessary to support heterogeneous pools.  For example, if you have
an ipv6 addressed network, you'll want to be able to forward ipv4 traffic
into it.

This patch enforces that destination address family is the same as service
family, as none of the forwarding mechanisms support anything else.

For the old setsockopt mechanism, we simply set the dest address family to
AF_INET as we do with the service.

Signed-off-by: Alex Gartrell &lt;agartrell@fb.com&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
</content>
</entry>
<entry>
<title>arch: Mass conversion of smp_mb__*()</title>
<updated>2014-04-18T12:20:48Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2014-03-17T17:06:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4e857c58efeb99393cba5a5d0d8ec7117183137c'/>
<id>urn:sha1:4e857c58efeb99393cba5a5d0d8ec7117183137c</id>
<content type='text'>
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
</feed>
