<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/net/tipc/socket.h, branch v4.20</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.20</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.20'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-09-07T04:49:18Z</updated>
<entry>
<title>tipc: call start and done ops directly in __tipc_nl_compat_dumpit()</title>
<updated>2018-09-07T04:49:18Z</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2018-09-04T21:54:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8f5c5fcf353302374b36232d6885c1a3b579e5ca'/>
<id>urn:sha1:8f5c5fcf353302374b36232d6885c1a3b579e5ca</id>
<content type='text'>
__tipc_nl_compat_dumpit() uses a netlink_callback on stack,
so the only way to align it with other -&gt;dumpit() call path
is calling tipc_dump_start() and tipc_dump_done() directly
inside it. Otherwise -&gt;dumpit() would always get NULL from
cb-&gt;args[].

But tipc_dump_start() uses sock_net(cb-&gt;skb-&gt;sk) to retrieve
net pointer, the cb-&gt;skb here doesn't set skb-&gt;sk, the net pointer
is saved in msg-&gt;net instead, so introduce a helper function
__tipc_dump_start() to pass in msg-&gt;net.

Ying pointed out cb-&gt;args[0...3] are already used by other
callbacks on this call path, so we can't use cb-&gt;args[0] any
more, use cb-&gt;args[4] instead.

Fixes: 9a07efa9aea2 ("tipc: switch to rhashtable iterator")
Reported-and-tested-by: syzbot+e93a2c41f91b8e2c7d9b@syzkaller.appspotmail.com
Cc: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Cc: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: switch to rhashtable iterator</title>
<updated>2018-08-30T01:04:54Z</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2018-08-24T19:28:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9a07efa9aea2f4a59f35da0785a4e6a6b5a96192'/>
<id>urn:sha1:9a07efa9aea2f4a59f35da0785a4e6a6b5a96192</id>
<content type='text'>
syzbot reported a use-after-free in tipc_group_fill_sock_diag(),
where tipc_group_fill_sock_diag() still reads tsk-&gt;group meanwhile
tipc_group_delete() just deletes it in tipc_release().

tipc_nl_sk_walk() aims to lock this sock when walking each sock
in the hash table to close race conditions with sock changes like
this one, by acquiring tsk-&gt;sk.sk_lock.slock spinlock, unfortunately
this doesn't work at all. All non-BH call path should take
lock_sock() instead to make it work.

tipc_nl_sk_walk() brutally iterates with raw rht_for_each_entry_rcu()
where RCU read lock is required, this is the reason why lock_sock()
can't be taken on this path. This could be resolved by switching to
rhashtable iterator API's, where taking a sleepable lock is possible.
Also, the iterator API's are friendly for restartable calls like
diag dump, the last position is remembered behind the scence,
all we need to do here is saving the iterator into cb-&gt;args[].

I tested this with parallel tipc diag dump and thousands of tipc
socket creation and release, no crash or memory leak.

Reported-by: syzbot+b9c8f3ab2994b7cd1625@syzkaller.appspotmail.com
Cc: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Cc: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: use the right skb in tipc_sk_fill_sock_diag()</title>
<updated>2018-04-08T16:34:29Z</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2018-04-07T01:54:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e41f0548473eb7b6499bd8482474e30ae6d31220'/>
<id>urn:sha1:e41f0548473eb7b6499bd8482474e30ae6d31220</id>
<content type='text'>
Commit 4b2e6877b879 ("tipc: Fix namespace violation in tipc_sk_fill_sock_diag")
tried to fix the crash but failed, the crash is still 100% reproducible
with it.

In tipc_sk_fill_sock_diag(), skb is the diag dump we are filling, it is not
correct to retrieve its NETLINK_CB(), instead, like other protocol diag,
we should use NETLINK_CB(cb-&gt;skb).sk here.

Reported-by: &lt;syzbot+326e587eff1074657718@syzkaller.appspotmail.com&gt;
Fixes: 4b2e6877b879 ("tipc: Fix namespace violation in tipc_sk_fill_sock_diag")
Fixes: c30b70deb5f4 (tipc: implement socket diagnostics for AF_TIPC)
Cc: GhantaKrishnamurthy MohanKrishna &lt;mohan.krishna.ghanta.krishnamurthy@ericsson.com&gt;
Cc: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Cc: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: implement socket diagnostics for AF_TIPC</title>
<updated>2018-03-22T18:43:35Z</updated>
<author>
<name>GhantaKrishnamurthy MohanKrishna</name>
<email>mohan.krishna.ghanta.krishnamurthy@ericsson.com</email>
</author>
<published>2018-03-21T13:37:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c30b70deb5f4861f590031c33fd3ec6cc63f1df1'/>
<id>urn:sha1:c30b70deb5f4861f590031c33fd3ec6cc63f1df1</id>
<content type='text'>
This commit adds socket diagnostics capability for AF_TIPC in netlink
family NETLINK_SOCK_DIAG in a new kernel module (diag.ko).

The following are key design considerations:
- config TIPC_DIAG has default y, like INET_DIAG.
- only requests with flag NLM_F_DUMP is supported (dump all).
- tipc_sock_diag_req message is introduced to send filter parameters.
- the response attributes are of TLV, some nested.

To avoid exposing data structures between diag and tipc modules and
avoid code duplication, the following additions are required:
- export tipc_nl_sk_walk function to reuse socket iterator.
- export tipc_sk_fill_sock_diag to fill the tipc diag attributes.
- create a sock_diag response message in __tipc_add_sock_diag defined
  in diag.c and use the above exported tipc_sk_fill_sock_diag
  to fill response.

Acked-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: GhantaKrishnamurthy MohanKrishna &lt;mohan.krishna.ghanta.krishnamurthy@ericsson.com&gt;
Signed-off-by: Parthasarathy Bhuvaragan &lt;parthasarathy.bhuvaragan@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: redesign connection-level flow control</title>
<updated>2016-05-03T19:51:16Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2016-05-02T15:58:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=10724cc7bb7832b482df049c20fd824d928c5eaa'/>
<id>urn:sha1:10724cc7bb7832b482df049c20fd824d928c5eaa</id>
<content type='text'>
There are two flow control mechanisms in TIPC; one at link level that
handles network congestion, burst control, and retransmission, and one
at connection level which' only remaining task is to prevent overflow
in the receiving socket buffer. In TIPC, the latter task has to be
solved end-to-end because messages can not be thrown away once they
have been accepted and delivered upwards from the link layer, i.e, we
can never permit the receive buffer to overflow.

Currently, this algorithm is message based. A counter in the receiving
socket keeps track of number of consumed messages, and sends a dedicated
acknowledge message back to the sender for each 256 consumed message.
A counter at the sending end keeps track of the sent, not yet
acknowledged messages, and blocks the sender if this number ever reaches
512 unacknowledged messages. When the missing acknowledge arrives, the
socket is then woken up for renewed transmission. This works well for
keeping the message flow running, as it almost never happens that a
sender socket is blocked this way.

A problem with the current mechanism is that it potentially is very
memory consuming. Since we don't distinguish between small and large
messages, we have to dimension the socket receive buffer according
to a worst-case of both. I.e., the window size must be chosen large
enough to sustain a reasonable throughput even for the smallest
messages, while we must still consider a scenario where all messages
are of maximum size. Hence, the current fix window size of 512 messages
and a maximum message size of 66k results in a receive buffer of 66 MB
when truesize(66k) = 131k is taken into account. It is possible to do
much better.

This commit introduces an algorithm where we instead use 1024-byte
blocks as base unit. This unit, always rounded upwards from the
actual message size, is used when we advertise windows as well as when
we count and acknowledge transmitted data. The advertised window is
based on the configured receive buffer size in such a way that even
the worst-case truesize/msgsize ratio always is covered. Since the
smallest possible message size (from a flow control viewpoint) now is
1024 bytes, we can safely assume this ratio to be less than four, which
is the value we are now using.

This way, we have been able to reduce the default receive buffer size
from 66 MB to 2 MB with maintained performance.

In order to keep this solution backwards compatible, we introduce a
new capability bit in the discovery protocol, and use this throughout
the message sending/reception path to always select the right unit.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: clean up socket layer message reception</title>
<updated>2015-07-26T23:31:50Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2015-07-22T14:11:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cda3696d3d26eb798c94de0dab5bd66ddb5627cb'/>
<id>urn:sha1:cda3696d3d26eb798c94de0dab5bd66ddb5627cb</id>
<content type='text'>
When a message is received in a socket, one of the call chains
tipc_sk_rcv()-&gt;tipc_sk_enqueue()-&gt;filter_rcv()(-&gt;tipc_sk_proto_rcv())
or
tipc_sk_backlog_rcv()-&gt;filter_rcv()(-&gt;tipc_sk_proto_rcv())
are followed. At each of these levels we may encounter situations
where the message may need to be rejected, or a new message
produced for transfer back to the sender. Despite recent
improvements, the current code for doing this is perceived
as awkward and hard to follow.

Leveraging the two previous commits in this series, we now
introduce a more uniform handling of such situations. We
let each of the functions in the chain itself produce/reverse
the message to be returned to the sender, but also perform the
actual forwarding. This simplifies the necessary logics within
each function.

Reviewed-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: fix netns refcnt leak</title>
<updated>2015-03-18T02:11:26Z</updated>
<author>
<name>Ying Xue</name>
<email>ying.xue@windriver.com</email>
</author>
<published>2015-03-18T01:32:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=76100a8a64bc2ae898bc49d51dd28c1f4f5ed37b'/>
<id>urn:sha1:76100a8a64bc2ae898bc49d51dd28c1f4f5ed37b</id>
<content type='text'>
When the TIPC module is loaded, we launch a topology server in kernel
space, which in its turn is creating TIPC sockets for communication
with topology server users. Because both the socket's creator and
provider reside in the same module, it is necessary that the TIPC
module's reference count remains zero after the server is started and
the socket created; otherwise it becomes impossible to perform "rmmod"
even on an idle module.

Currently, we achieve this by defining a separate "tipc_proto_kern"
protocol struct, that is used only for kernel space socket allocations.
This structure has the "owner" field set to NULL, which restricts the
module reference count from being be bumped when sk_alloc() for local
sockets is called. Furthermore, we have defined three kernel-specific
functions, tipc_sock_create_local(), tipc_sock_release_local() and
tipc_sock_accept_local(), to avoid the module counter being modified
when module local sockets are created or deleted. This has worked well
until we introduced name space support.

However, after name space support was introduced, we have observed that
a reference count leak occurs, because the netns counter is not
decremented in tipc_sock_delete_local().

This commit remedies this problem. But instead of just modifying
tipc_sock_delete_local(), we eliminate the whole parallel socket
handling infrastructure, and start using the regular sk_create_kern(),
kernel_accept() and sk_release_kernel() calls. Since those functions
manipulate the module counter, we must now compensate for that by
explicitly decrementing the counter after module local sockets are
created, and increment it just before calling sk_release_kernel().

Fixes: a62fbccecd62 ("tipc: make subscriber server support net namespace")
Signed-off-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Reviewed-by: Jon Maloy &lt;jon.maloy@ericson.com&gt;
Reviewed-by: Erik Hugne &lt;erik.hugne@ericsson.com&gt;
Reported-by: Cong Wang &lt;cwang@twopensource.com&gt;
Tested-by: Erik Hugne &lt;erik.hugne@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: convert legacy nl socket dump to nl compat</title>
<updated>2015-02-09T21:20:48Z</updated>
<author>
<name>Richard Alpe</name>
<email>richard.alpe@ericsson.com</email>
</author>
<published>2015-02-09T08:50:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=487d2a3a1326d339ce273ffbcd03247f2b7b052e'/>
<id>urn:sha1:487d2a3a1326d339ce273ffbcd03247f2b7b052e</id>
<content type='text'>
Convert socket (port) listing to compat dumpit call. If a socket
(port) has publications a second dumpit call is issued to collect them
and format then into the legacy buffer before continuing to process
the sockets (ports).

Command converted in this patch:
TIPC_CMD_SHOW_PORTS

Signed-off-by: Richard Alpe &lt;richard.alpe@ericsson.com&gt;
Reviewed-by: Erik Hugne &lt;erik.hugne@ericsson.com&gt;
Reviewed-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Reviewed-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: eliminate race condition at multicast reception</title>
<updated>2015-02-06T00:00:03Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2015-02-05T13:36:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cb1b728096f54e7408d60fb571944bed00c5b771'/>
<id>urn:sha1:cb1b728096f54e7408d60fb571944bed00c5b771</id>
<content type='text'>
In a previous commit in this series we resolved a race problem during
unicast message reception.

Here, we resolve the same problem at multicast reception. We apply the
same technique: an input queue serializing the delivery of arriving
buffers. The main difference is that here we do it in two steps.
First, the broadcast link feeds arriving buffers into the tail of an
arrival queue, which head is consumed at the socket level, and where
destination lookup is performed. Second, if the lookup is successful,
the resulting buffer clones are fed into a second queue, the input
queue. This queue is consumed at reception in the socket just like
in the unicast case. Both queues are protected by the same lock, -the
one of the input queue.

Reviewed-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tipc: simplify socket multicast reception</title>
<updated>2015-02-06T00:00:03Z</updated>
<author>
<name>Jon Paul Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2015-02-05T13:36:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3c724acdd5049907555a831f814bfd5927c3350c'/>
<id>urn:sha1:3c724acdd5049907555a831f814bfd5927c3350c</id>
<content type='text'>
The structure 'tipc_port_list' is used to collect port numbers
representing multicast destination socket on a receiving node.
The list is not based on a standard linked list, and is in reality
optimized for the uncommon case that there are more than one
multicast destinations per node. This makes the list handling
unecessarily complex, and as a consequence, even the socket
multicast reception becomes more complex.

In this commit, we replace 'tipc_port_list' with a new 'struct
tipc_plist', which is based on a standard list. We give the new
list stack (push/pop) semantics, someting that simplifies
the implementation of the function tipc_sk_mcast_rcv().

Reviewed-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
