| Age | Commit message (Collapse) | Author |
|
commit 62ffc589abb176821662efc4525ee4ac0b9c3894 upstream.
Refactor the fastreuse update code in inet_csk_get_port into a small
helper function that can be called from other places.
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1e105e6afa6c3d32bfb52c00ffa393894a525c27 ]
Following bug was reported via irc:
nft list ruleset
set knock_candidates_ipv4 {
type ipv4_addr . inet_service
size 65535
elements = { 127.0.0.1 . 123,
127.0.0.1 . 123 }
}
..
udp dport 123 add @knock_candidates_ipv4 { ip saddr . 123 }
udp dport 123 add @knock_candidates_ipv4 { ip saddr . udp dport }
It should not have been possible to add a duplicate set entry.
After some debugging it turned out that the problem is the immediate
value (123) in the second-to-last rule.
Concatenations use 32bit registers, i.e. the elements are 8 bytes each,
not 6 and it turns out the kernel inserted
inet firewall @knock_candidates_ipv4
element 0100007f ffff7b00 : 0 [end]
element 0100007f 00007b00 : 0 [end]
Note the non-zero upper bits of the first element. It turns out that
nft_immediate doesn't zero the destination register, but this is needed
when the length isn't a multiple of 4.
Furthermore, the zeroing in nft_payload is broken. We can't use
[len / 4] = 0 -- if len is a multiple of 4, index is off by one.
Skip zeroing in this case and use a conditional instead of (len -1) / 4.
Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d9539752d23283db4692384a634034f451261e29 ]
Add missed sock updates to compat path via a new helper, which will be
used more in coming patches. (The net/core/scm.c code is left as-is here
to assist with -stable backports for the compat path.)
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
Fixes: d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8c0de6e96c9794cb523a516c465991a70245da1c ]
IPV6_ADDRFORM causes resource leaks when converting an IPv6 socket
to IPv4, particularly struct ipv6_ac_socklist. Similar to
struct ipv6_mc_socklist, we should just close it on this path.
This bug can be easily reproduced with the following C program:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
int main()
{
int s, value;
struct sockaddr_in6 addr;
struct ipv6_mreq m6;
s = socket(AF_INET6, SOCK_DGRAM, 0);
addr.sin6_family = AF_INET6;
addr.sin6_port = htons(5000);
inet_pton(AF_INET6, "::ffff:192.168.122.194", &addr.sin6_addr);
connect(s, (struct sockaddr *)&addr, sizeof(addr));
inet_pton(AF_INET6, "fe80::AAAA", &m6.ipv6mr_multiaddr);
m6.ipv6mr_interface = 5;
setsockopt(s, SOL_IPV6, IPV6_JOIN_ANYCAST, &m6, sizeof(m6));
value = AF_INET;
setsockopt(s, SOL_IPV6, IPV6_ADDRFORM, &value, sizeof(value));
close(s);
return 0;
}
Reported-by: ch3332xr@gmail.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1e82a62fec613844da9e558f3493540a5b7a7b67 ]
A potential deadlock can occur during registering or unregistering a
new generic netlink family between the main nl_table_lock and the
cb_lock where each thread wants the lock held by the other, as
demonstrated below.
1) Thread 1 is performing a netlink_bind() operation on a socket. As part
of this call, it will call netlink_lock_table(), incrementing the
nl_table_users count to 1.
2) Thread 2 is registering (or unregistering) a genl_family via the
genl_(un)register_family() API. The cb_lock semaphore will be taken for
writing.
3) Thread 1 will call genl_bind() as part of the bind operation to handle
subscribing to GENL multicast groups at the request of the user. It will
attempt to take the cb_lock semaphore for reading, but it will fail and
be scheduled away, waiting for Thread 2 to finish the write.
4) Thread 2 will call netlink_table_grab() during the (un)registration
call. However, as Thread 1 has incremented nl_table_users, it will not
be able to proceed, and both threads will be stuck waiting for the
other.
genl_bind() is a noop, unless a genl_family implements the mcast_bind()
function to handle setting up family-specific multicast operations. Since
no one in-tree uses this functionality as Cong pointed out, simply removing
the genl_bind() function will remove the possibility for deadlock, as there
is no attempt by Thread 1 above to take the cb_lock semaphore.
Fixes: c380d9a7afff ("genetlink: pass multicast bind/unbind to families")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Johannes Berg <johannes.berg@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 394de110a73395de2ca4516b0de435e91b11b604 ]
The packets from tunnel devices (eg bareudp) may have only
metadata in the dst pointer of skb. Hence a pointer check of
neigh_lookup is needed in dst_neigh_lookup_skb
Kernel crashes when packets from bareudp device is processed in
the kernel neighbour subsytem.
[ 133.384484] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 133.385240] #PF: supervisor instruction fetch in kernel mode
[ 133.385828] #PF: error_code(0x0010) - not-present page
[ 133.386603] PGD 0 P4D 0
[ 133.386875] Oops: 0010 [#1] SMP PTI
[ 133.387275] CPU: 0 PID: 5045 Comm: ping Tainted: G W 5.8.0-rc2+ #15
[ 133.388052] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 133.391076] RIP: 0010:0x0
[ 133.392401] Code: Bad RIP value.
[ 133.394029] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246
[ 133.396656] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00
[ 133.399018] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400
[ 133.399685] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000
[ 133.400350] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400
[ 133.401010] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003
[ 133.401667] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000
[ 133.402412] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 133.402948] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0
[ 133.403611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 133.404270] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 133.404933] Call Trace:
[ 133.405169] <IRQ>
[ 133.405367] __neigh_update+0x5a4/0x8f0
[ 133.405734] arp_process+0x294/0x820
[ 133.406076] ? __netif_receive_skb_core+0x866/0xe70
[ 133.406557] arp_rcv+0x129/0x1c0
[ 133.406882] __netif_receive_skb_one_core+0x95/0xb0
[ 133.407340] process_backlog+0xa7/0x150
[ 133.407705] net_rx_action+0x2af/0x420
[ 133.408457] __do_softirq+0xda/0x2a8
[ 133.408813] asm_call_on_stack+0x12/0x20
[ 133.409290] </IRQ>
[ 133.409519] do_softirq_own_stack+0x39/0x50
[ 133.410036] do_softirq+0x50/0x60
[ 133.410401] __local_bh_enable_ip+0x50/0x60
[ 133.410871] ip_finish_output2+0x195/0x530
[ 133.411288] ip_output+0x72/0xf0
[ 133.411673] ? __ip_finish_output+0x1f0/0x1f0
[ 133.412122] ip_send_skb+0x15/0x40
[ 133.412471] raw_sendmsg+0x853/0xab0
[ 133.412855] ? insert_pfn+0xfe/0x270
[ 133.413827] ? vvar_fault+0xec/0x190
[ 133.414772] sock_sendmsg+0x57/0x80
[ 133.415685] __sys_sendto+0xdc/0x160
[ 133.416605] ? syscall_trace_enter+0x1d4/0x2b0
[ 133.417679] ? __audit_syscall_exit+0x1d9/0x280
[ 133.418753] ? __prepare_exit_to_usermode+0x5d/0x1a0
[ 133.419819] __x64_sys_sendto+0x24/0x30
[ 133.420848] do_syscall_64+0x4d/0x90
[ 133.421768] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 133.422833] RIP: 0033:0x7fe013689c03
[ 133.423749] Code: Bad RIP value.
[ 133.424624] RSP: 002b:00007ffc7288f418 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 133.425940] RAX: ffffffffffffffda RBX: 000056151fc63720 RCX: 00007fe013689c03
[ 133.427225] RDX: 0000000000000040 RSI: 000056151fc63720 RDI: 0000000000000003
[ 133.428481] RBP: 00007ffc72890b30 R08: 000056151fc60500 R09: 0000000000000010
[ 133.429757] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 133.431041] R13: 000056151fc636e0 R14: 000056151fc616bc R15: 0000000000000080
[ 133.432481] Modules linked in: mpls_iptunnel act_mirred act_tunnel_key cls_flower sch_ingress veth mpls_router ip_tunnel bareudp ip6_udp_tunnel udp_tunnel macsec udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc xt_MASQUERADE iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc ebtable_filter ebtables overlay ip6table_filter ip6_tables iptable_filter sunrpc ext4 mbcache jbd2 pcspkr i2c_piix4 virtio_balloon joydev ip_tables xfs libcrc32c ata_generic qxl pata_acpi drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ata_piix libata virtio_net net_failover virtio_console failover virtio_blk i2c_core virtio_pci virtio_ring serio_raw floppy virtio dm_mirror dm_region_hash dm_log dm_mod
[ 133.444045] CR2: 0000000000000000
[ 133.445082] ---[ end trace f4aeee1958fd1638 ]---
[ 133.446236] RIP: 0010:0x0
[ 133.447180] Code: Bad RIP value.
[ 133.448152] RSP: 0018:ffffb79980003d50 EFLAGS: 00010246
[ 133.449363] RAX: 0000000080000102 RBX: ffff9de2fe0d6600 RCX: ffff9de2fe5e9d00
[ 133.450835] RDX: 0000000000000000 RSI: ffff9de2fe5e9d00 RDI: ffff9de2fc21b400
[ 133.452237] RBP: ffff9de2fe5e9d00 R08: 0000000000000000 R09: 0000000000000000
[ 133.453722] R10: ffff9de2fbc6be22 R11: ffff9de2fe0d6600 R12: ffff9de2fc21b400
[ 133.455149] R13: ffff9de2fe0d6628 R14: 0000000000000001 R15: 0000000000000003
[ 133.456520] FS: 00007fe014918740(0000) GS:ffff9de2fec00000(0000) knlGS:0000000000000000
[ 133.458046] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 133.459342] CR2: ffffffffffffffd6 CR3: 000000003bb72000 CR4: 00000000000006f0
[ 133.460782] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 133.462240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 133.463697] Kernel panic - not syncing: Fatal exception in interrupt
[ 133.465226] Kernel Offset: 0xfa00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 133.467025] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Fixes: aaa0c23cb901 ("Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug")
Signed-off-by: Martin Varghese <martin.varghese@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 41b14fb8724d5a4b382a63cb4a1a61880347ccb8 ]
Clearing the sock TX queue in sk_set_socket() might cause unexpected
out-of-order transmit when called from sock_orphan(), as outstanding
packets can pick a different TX queue and bypass the ones already queued.
This is undesired in general. More specifically, it breaks the in-order
scheduling property guarantee for device-offloaded TLS sockets.
Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it
explicitly only where needed.
Fixes: e022f0b4a03f ("net: Introduce sk_tx_queue_mapping")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 471e39df96b9a4c4ba88a2da9e25a126624d7a9c ]
If a socket is set ipv6only, it will still send IPv4 addresses in the
INIT and INIT_ACK packets. This potentially misleads the peer into using
them, which then would cause association termination.
The fix is to not add IPv4 addresses to ipv6only sockets.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0382a25af3c771a8e4d5e417d1834cbe28c2aaac upstream.
Socket flags aren't updated atomically, so the socket must be locked
while reading the SOCK_ZAPPED flag.
This issue exists for both l2tp_ip and l2tp_ip6. For IPv6, this patch
also brings error handling for __ip6_datagram_connect() failures.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Giuliano Procida <gprocida@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 2c407aca64977ede9b9f35158e919773cae2082f ]
gcc-10 warns around a suspicious access to an empty struct member:
net/netfilter/nf_conntrack_core.c: In function '__nf_conntrack_alloc':
net/netfilter/nf_conntrack_core.c:1522:9: warning: array subscript 0 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[0]'} [-Wzero-length-bounds]
1522 | memset(&ct->__nfct_init_offset[0], 0,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from net/netfilter/nf_conntrack_core.c:37:
include/net/netfilter/nf_conntrack.h:90:5: note: while referencing '__nfct_init_offset'
90 | u8 __nfct_init_offset[0];
| ^~~~~~~~~~~~~~~~~~
The code is correct but a bit unusual. Rework it slightly in a way that
does not trigger the warning, using an empty struct instead of an empty
array. There are probably more elegant ways to do this, but this is the
smallest change.
Fixes: c41884ce0562 ("netfilter: conntrack: avoid zeroing timer")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 upstream.
ipv6_stub uses the ip6_dst_lookup function to allow other modules to
perform IPv6 lookups. However, this function skips the XFRM layer
entirely.
All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
which calls xfrm_lookup_route(). This patch fixes this inconsistent
behavior by switching the stub to ip6_dst_lookup_flow, which also calls
xfrm_lookup_route().
This requires some changes in all the callers, as these two functions
take different arguments and have different return types.
Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 4.4:
- Drop changes in lwt_bpf.c, mlx5, and rxe
- Adjust filename, context, indentation]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c4e85f73afb6384123e5ef1bba3315b2e3ad031e upstream.
This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow,
as some modules currently pass a net argument without a socket to
ip6_dst_lookup. This is equivalent to commit 343d60aada5a ("ipv6: change
ipv6_stub_impl.ipv6_dst_lookup to take net argument").
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 63c43787d35e45562a6b5927e2edc8f4783d95b8 upstream.
Since commit 1625f4529957, vti6 is broken, all input packets are dropped
(LINUX_MIB_XFRMINNOSTATES is incremented).
XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling
xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in
xfrm6_rcv_spi().
A new function xfrm6_rcv_tnl() that enables to pass a value to
xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function
is used in several handlers).
CC: Alexey Kodanev <alexey.kodanev@oracle.com>
Fixes: 1625f4529957 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5a56a0b3a45dd0cc5b2f7bec6afd053a474ed9f5 upstream.
When deleting an IP address from an interface, there is a clean-up of
routes which refer to this local address. However, there was no check to
see that the VRF matched. This meant that deletion wasn't confined to
the VRF it should have been.
To solve this, a new field has been added to fib_info to hold a table
id. When removing fib entries corresponding to a local ip address, this
table id is also used in the comparison.
The table id is populated when the fib_info is created. This was already
done in some places, but not in ip_rt_ioctl(). This has now been fixed.
Fixes: 021dd3b8a142 ("net: Add routes to the table associated with the device")
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9ab179d83b4e31ea277a123492e419067c2f129a upstream.
Vivek reported a kernel exception deleting a VRF with an active
connection through it. The root cause is that the socket has a cached
reference to a dst that is destroyed. Converting the dst_destroy to
dst_release and letting proper reference counting kick in does not
work as the dst has a reference to the device which needs to be released
as well.
I talked to Hannes about this at netdev and he pointed out the ipv4 and
ipv6 dst handling has dst_ifdown for just this scenario. Rather than
continuing with the reinvented dst wheel in VRF just remove it and
leverage the ipv4 and ipv6 versions.
Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver")
Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 830218c1add1da16519b71909e5cf21522b7d062 upstream.
rt6_add_route_info and rt6_add_dflt_router were updated to pull the FIB
table from the device index, but the corresponding rt6_get_route_info
and rt6_get_dflt_router functions were not leading to the failure to
process RA's:
ICMPv6: RA: ndisc_router_discovery failed to add default route
Fix the 'get' functions by using the table id associated with the
device when applicable.
Also, now that default routes can be added to tables other than the
default table, rt6_purge_dflt_routers needs to be updated as well to
look at all tables. To handle that efficiently, add a flag to the table
denoting if it is has a default route via RA.
Fixes: ca254490c8dfd ("net: Add VRF support to IPv6 stack")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8c2c2358b236530bc2c79b4c2a447cbdbc3d96d7 upstream.
When charging socket memory, the code currently checks only the local
page counter for excess to determine whether the memcg is under socket
pressure. But even if the local counter is fine, one of the ancestors
could have breached its limit, which should also force this child to
enter socket pressure. This currently doesn't happen.
Fix this by using page_counter_try_charge() first. If that fails, it
means that either the local counter or one of the ancestors are in
excess of their limit, and the child should enter socket pressure.
Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f87fda00b6ed232a817c655b8d179b48bde8fdbe upstream.
ether_addr_equal_64bits() requires some care about its arguments,
namely that 8 bytes might be read, even if last 2 byte values are not
used.
KASan detected a violation with null_mac_addr and lacpdu_mcast_addr
in bond_3ad.c
Same problem with mac_bcast[] and mac_v6_allmcast[] in bond_alb.c :
Although the 8-byte alignment was there, KASan would detect out
of bound accesses.
Fixes: 815117adaf5b ("bonding: use ether_addr_equal_unaligned for bond addr compare")
Fixes: bb54e58929f3 ("bonding: Verify RX LACPDU has proper dest mac-addr")
Fixes: 885a136c52a8 ("bonding: use compare_ether_addr_64bits() in ALB")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a27758ffaf96f89002129eedb2cc172d254099f8 upstream.
For gso_skb we only update qlen, backlog should be updated too.
Note, it is correct to just update these stats at one layer,
because the gso_skb is cached there.
Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4f0c40d94461cfd23893a17335b2ab78ecb333c8 upstream.
Dccp verifies packet integrity, including length, at initial rcv in
dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.
A call to sk_filter in-between can cause __skb_pull to wrap skb->len.
skb_copy_datagram_msg interprets this as a negative value, so
(correctly) fails with EFAULT. The negative length is reported in
ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.
Introduce an sk_receive_skb variant that caps how small a filter
program can trim packets, and call this in dccp with the header
length. Excessively trimmed packets are now processed normally and
queued for reception as 0B payloads.
Fixes: 7c657876b63c ("[DCCP]: Initial implementation")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 9bacd256f1354883d3c1402655153367982bba49 ]
TCP stack is dumb in how it cooks its output packets.
Depending on MAX_HEADER value, we might chose a bad ending point
for the headers.
If we align the end of TCP headers to cache line boundary, we
make sure to always use the smallest number of cache lines,
which always help.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 03e2a984b6165621f287fadf5f4b5cd8b58dcaba ]
The behaviour for what is considered an anycast address changed in
commit 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after
encountering pmtu exception"). This now considers the first
address in a subnet where there is a route via a gateway
to be an anycast address.
This breaks path MTU discovery and traceroutes when a host in a
remote network uses the address at the start of a prefix
(eg 2600:: advertised as 2600::/48 in the DFZ) as ICMP errors
will not be sent to anycast addresses.
This patch excludes any routes with a gateway, or via point to
point links, like the behaviour previously from
rt6_is_gw_or_nonexthop in net/ipv6/route.c.
This can be tested with:
ip link add v1 type veth peer name v2
ip netns add test
ip netns exec test ip link set lo up
ip link set v2 netns test
ip link set v1 up
ip netns exec test ip link set v2 up
ip addr add 2001:db8::1/64 dev v1 nodad
ip addr add 2001:db8:100:: dev lo nodad
ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad
ip netns exec test ip route add unreachable 2001:db8:1::1
ip netns exec test ip route add 2001:db8:100::/64 via 2001:db8::1
ip netns exec test sysctl net.ipv6.conf.all.forwarding=1
ip route add 2001:db8:1::1 via 2001:db8::2
ping -I 2001:db8::1 2001:db8:1::1 -c1
ping -I 2001:db8:100:: 2001:db8:1::1 -c1
ip addr delete 2001:db8:100:: dev lo
ip netns delete test
Currently the first ping will get back a destination unreachable ICMP
error, but the second will never get a response, with "icmp6_send:
acast source" logged. After this patch, both get destination
unreachable ICMP replies.
Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Tim Stallard <code@timstallard.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4c16d64ea04056f1b1b324ab6916019f6a064114 ]
Add missing netlink policy entry for FRA_TUN_ID.
Fixes: e7030878fc84 ("fib: Add fib rule match on tunnel id")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8a9093c79863b58cc2f9874d7ae788f0d622a596 ]
tc flower rules that are based on src or dst port blocking are sometimes
ineffective due to uninitialized stack data. __skb_flow_dissect() extracts
ports from the skb for tc flower to match against. However, the port
dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in
key_control->flags. All callers of __skb_flow_dissect(), zero-out the
key_control field except for fl_classify() as used by the flower
classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to
__skb_flow_dissect(), since key_control is allocated on the stack
and may not be initialized.
Since key_basic and key_control are present for all flow keys, let's
make sure they are initialized.
Fixes: 62230715fd24 ("flow_dissector: do not dissect l4 ports for fragments")
Co-developed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 30ca1aa536211f5ac3de0173513a7a99a98a97f3 upstream.
Make ieee80211_send_layer2_update() a common function so other drivers
can re-use it.
Signed-off-by: Dedy Lansky <dlansky@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[bwh: Backported to 4.4 as dependency of commit 3e493173b784
"mac80211: Do not send Layer 2 Update frame before authorization":
- Retain type-casting of skb_put() return value
- Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c305c6ae79e2ce20c22660ceda94f0d86d639a82 ]
KCSAN reported a data-race [1]
While we can use READ_ONCE() on the read sides,
we need to make sure hh->hh_len is written last.
[1]
BUG: KCSAN: data-race in eth_header_cache / neigh_resolve_output
write to 0xffff8880b9dedcb8 of 4 bytes by task 29760 on cpu 0:
eth_header_cache+0xa9/0xd0 net/ethernet/eth.c:247
neigh_hh_init net/core/neighbour.c:1463 [inline]
neigh_resolve_output net/core/neighbour.c:1480 [inline]
neigh_resolve_output+0x415/0x470 net/core/neighbour.c:1470
neigh_output include/net/neighbour.h:511 [inline]
ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
NF_HOOK include/linux/netfilter.h:305 [inline]
ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
read to 0xffff8880b9dedcb8 of 4 bytes by task 29572 on cpu 1:
neigh_resolve_output net/core/neighbour.c:1479 [inline]
neigh_resolve_output+0x113/0x470 net/core/neighbour.c:1470
neigh_output include/net/neighbour.h:511 [inline]
ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
__ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
__ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
dst_output include/net/dst.h:436 [inline]
NF_HOOK include/linux/netfilter.h:305 [inline]
ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 29572 Comm: kworker/1:4 Not tainted 5.4.0-rc6+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events rt6_probe_deferred
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 258a980d1ec23e2c786e9536a7dd260bea74bae6 ]
When storing a pointer to a dst_metrics structure in dst_entry._metrics,
two flags are added in the least significant bits of the pointer value.
Hence this assumes all pointers to dst_metrics structures have at least
4-byte alignment.
However, on m68k, the minimum alignment of 32-bit values is 2 bytes, not
4 bytes. Hence in some kernel builds, dst_default_metrics may be only
2-byte aligned, leading to obscure boot warnings like:
WARNING: CPU: 0 PID: 7 at lib/refcount.c:28 refcount_warn_saturate+0x44/0x9a
refcount_t: underflow; use-after-free.
Modules linked in:
CPU: 0 PID: 7 Comm: ksoftirqd/0 Tainted: G W 5.5.0-rc2-atari-01448-g114a1a1038af891d-dirty #261
Stack from 10835e6c:
10835e6c 0038134f 00023fa6 00394b0f 0000001c 00000009 00321560 00023fea
00394b0f 0000001c 001a70f8 00000009 00000000 10835eb4 00000001 00000000
04208040 0000000a 00394b4a 10835ed4 00043aa8 001a70f8 00394b0f 0000001c
00000009 00394b4a 0026aba8 003215a4 00000003 00000000 0026d5a8 00000001
003215a4 003a4361 003238d6 000001f0 00000000 003215a4 10aa3b00 00025e84
003ddb00 10834000 002416a8 10aa3b00 00000000 00000080 000aa038 0004854a
Call Trace: [<00023fa6>] __warn+0xb2/0xb4
[<00023fea>] warn_slowpath_fmt+0x42/0x64
[<001a70f8>] refcount_warn_saturate+0x44/0x9a
[<00043aa8>] printk+0x0/0x18
[<001a70f8>] refcount_warn_saturate+0x44/0x9a
[<0026aba8>] refcount_sub_and_test.constprop.73+0x38/0x3e
[<0026d5a8>] ipv4_dst_destroy+0x5e/0x7e
[<00025e84>] __local_bh_enable_ip+0x0/0x8e
[<002416a8>] dst_destroy+0x40/0xae
Fix this by forcing 4-byte alignment of all dst_metrics structures.
Fixes: e5fd387ad5b30ca3 ("ipv6: do not overwrite inetpeer metrics prematurely")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 721c8dafad26ccfa90ff659ee19755e3377b829d ]
Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the
timestamp of the last synflood. Protect them with READ_ONCE() and
WRITE_ONCE() since reads and writes aren't serialised.
Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was
introduced by a0f82f64e269 ("syncookies: remove last_synq_overflow from
struct tcp_sock"). But unprotected accesses were already there when
timestamp was stored in .last_synq_overflow.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit cb44a08f8647fd2e8db5cc9ac27cd8355fa392d8 ]
When no synflood occurs, the synflood timestamp isn't updated.
Therefore it can be so old that time_after32() can consider it to be
in the future.
That's a problem for tcp_synq_no_recent_overflow() as it may report
that a recent overflow occurred while, in fact, it's just that jiffies
has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
Spurious detection of recent overflows lead to extra syncookie
verification in cookie_v[46]_check(). At that point, the verification
should fail and the packet dropped. But we should have dropped the
packet earlier as we didn't even send a syncookie.
Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
only if jiffies is within the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
way, no spurious recent overflow is reported when jiffies wraps and
'last_overflow' becomes in the future from the point of view of
time_after32().
However, if jiffies wraps and enters the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
'last_overflow' being a stale synflood timestamp), then
tcp_synq_no_recent_overflow() still erroneously reports an
overflow. In such cases, we have to rely on syncookie verification
to drop the packet. We unfortunately have no way to differentiate
between a fresh and a stale syncookie timestamp.
In practice, using last_overflow as lower bound is problematic.
If the synflood timestamp is concurrently updated between the time
we read jiffies and the moment we store the timestamp in
'last_overflow', then 'now' becomes smaller than 'last_overflow' and
tcp_synq_no_recent_overflow() returns true, potentially dropping a
valid syncookie.
Reading jiffies after loading the timestamp could fix the problem,
but that'd require a memory barrier. Let's just accommodate for
potential timestamp growth instead and extend the interval using
'last_overflow - HZ' as lower bound.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 04d26e7b159a396372646a480f4caa166d1b6720 ]
If no synflood happens for a long enough period of time, then the
synflood timestamp isn't refreshed and jiffies can advance so much
that time_after32() can't accurately compare them any more.
Therefore, we can end up in a situation where time_after32(now,
last_overflow + HZ) returns false, just because these two values are
too far apart. In that case, the synflood timestamp isn't updated as
it should be, which can trick tcp_synq_no_recent_overflow() into
rejecting valid syncookies.
For example, let's consider the following scenario on a system
with HZ=1000:
* The synflood timestamp is 0, either because that's the timestamp
of the last synflood or, more commonly, because we're working with
a freshly created socket.
* We receive a new SYN, which triggers synflood protection. Let's say
that this happens when jiffies == 2147484649 (that is,
'synflood timestamp' + HZ + 2^31 + 1).
* Then tcp_synq_overflow() doesn't update the synflood timestamp,
because time_after32(2147484649, 1000) returns false.
With:
- 2147484649: the value of jiffies, aka. 'now'.
- 1000: the value of 'last_overflow' + HZ.
* A bit later, we receive the ACK completing the 3WHS. But
cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow()
says that we're not under synflood. That's because
time_after32(2147484649, 120000) returns false.
With:
- 2147484649: the value of jiffies, aka. 'now'.
- 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID.
Of course, in reality jiffies would have increased a bit, but this
condition will last for the next 119 seconds, which is far enough
to accommodate for jiffie's growth.
Fix this by updating the overflow timestamp whenever jiffies isn't
within the [last_overflow, last_overflow + HZ] range. That shouldn't
have any performance impact since the update still happens at most once
per second.
Now we're guaranteed to have fresh timestamps while under synflood, so
tcp_synq_no_recent_overflow() can safely use it with time_after32() in
such situations.
Stale timestamps can still make tcp_synq_no_recent_overflow() return
the wrong verdict when not under synflood. This will be handled in the
next patch.
For 64 bits architectures, the problem was introduced with the
conversion of ->tw_ts_recent_stamp to 32 bits integer by commit
cca9bab1b72c ("tcp: use monotonic timestamps for PAWS").
The problem has always been there on 32 bits architectures.
Fixes: cca9bab1b72c ("tcp: use monotonic timestamps for PAWS")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 501a90c945103e8627406763dac418f20f3837b2 ]
syzbot was once again able to crash a host by setting a very small mtu
on loopback device.
Let's make inetdev_valid_mtu() available in include/net/ip.h,
and use it in ip_setup_cork(), so that we protect both ip_append_page()
and __ip_append_data()
Also add a READ_ONCE() when the device mtu is read.
Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
even if other code paths might write over this field.
Add a big comment in include/linux/netdevice.h about dev->mtu
needing READ_ONCE()/WRITE_ONCE() annotations.
Hopefully we will add the missing ones in followup patches.
[1]
refcount_t: saturated; leaking memory.
WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
panic+0x2e3/0x75c kernel/panic.c:221
__warn.cold+0x2f/0x3e kernel/panic.c:582
report_bug+0x289/0x300 lib/bug.c:195
fixup_bug arch/x86/kernel/traps.c:174 [inline]
fixup_bug arch/x86/kernel/traps.c:169 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89
RSP: 0018:ffff88809689f550 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c
RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1
R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001
R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40
refcount_add include/linux/refcount.h:193 [inline]
skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999
sock_wmalloc+0xf1/0x120 net/core/sock.c:2096
ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383
udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276
inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821
kernel_sendpage+0x92/0xf0 net/socket.c:3794
sock_sendpage+0x8b/0xc0 net/socket.c:936
pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458
splice_from_pipe_feed fs/splice.c:512 [inline]
__splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636
splice_from_pipe+0x108/0x170 fs/splice.c:671
generic_splice_sendpage+0x3c/0x50 fs/splice.c:842
do_splice_from fs/splice.c:861 [inline]
direct_splice_actor+0x123/0x190 fs/splice.c:1035
splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990
do_splice_direct+0x1da/0x2a0 fs/splice.c:1078
do_sendfile+0x597/0xd00 fs/read_write.c:1464
__do_sys_sendfile64 fs/read_write.c:1525 [inline]
__se_sys_sendfile64 fs/read_write.c:1511 [inline]
__x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441409
Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409
RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005
RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010
R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180
R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000
Kernel Offset: disabled
Rebooting in 86400 seconds..
Fixes: 1470ddf7f8ce ("inet: Remove explicit write references to sk/inet in ip_append_data")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5bf325a53202b8728cf7013b72688c46071e212e ]
With many active TCP sockets, fat TCP sockets could fool
__sk_mem_raise_allocated() thanks to an overflow.
They would increase their share of the memory, instead
of decreasing it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9708d2b5b7c648e8e0a40d11e8cea12f6277f33c ]
llc_sap_close() is called by llc_sap_put() which
could be called in BH context in llc_rcv(). We can't
block in BH.
There is no reason to block it here, kfree_rcu() should
be sufficient.
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f75359f3ac855940c5718af10ba089b8977bf339 ]
Add a couple of READ_ONCE() and WRITE_ONCE() to prevent
load-tearing and store-tearing in sock_read_timestamp()
and sock_write_timestamp()
This might prevent another KCSAN report.
Fixes: 3a0ed3e96197 ("sock: Make sock->sk_stamp thread-safe")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit c24b75e0f9239e78105f81c5f03a751641eb07ef ]
syzbot reported the following issue :
BUG: KCSAN: data-race in update_defense_level / update_defense_level
read to 0xffffffff861a6260 of 4 bytes by task 3006 on cpu 1:
update_defense_level+0x621/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:177
defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
write to 0xffffffff861a6260 of 4 bytes by task 7333 on cpu 0:
update_defense_level+0xa62/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:205
defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 7333 Comm: kworker/0:5 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events defense_work_handler
Indeed, old_secure_tcp is currently a static variable, while it
needs to be a per netns variable.
Fixes: a0840e2e165a ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 250367c59e6ba0d79d702a059712d66edacd4a1a upstream.
Invoking the following commands on a 32-bit architecture with strict
alignment requirements (such as an ARMv7-based Raspberry Pi) results
in an alignment exception:
# nft add table ip test-ip4
# nft add chain ip test-ip4 output { type filter hook output priority 0; }
# nft add rule ip test-ip4 output quota 1025 bytes
Alignment trap: not handling instruction e1b26f9f at [<7f4473f8>]
Unhandled fault: alignment exception (0x001) at 0xb832e824
Internal error: : 1 [#1] PREEMPT SMP ARM
Hardware name: BCM2835
[<7f4473fc>] (nft_quota_do_init [nft_quota])
[<7f447448>] (nft_quota_init [nft_quota])
[<7f4260d0>] (nf_tables_newrule [nf_tables])
[<7f4168dc>] (nfnetlink_rcv_batch [nfnetlink])
[<7f416bd0>] (nfnetlink_rcv [nfnetlink])
[<8078b334>] (netlink_unicast)
[<8078b664>] (netlink_sendmsg)
[<8071b47c>] (sock_sendmsg)
[<8071bd18>] (___sys_sendmsg)
[<8071ce3c>] (__sys_sendmsg)
[<8071ce94>] (sys_sendmsg)
The reason is that nft_quota_do_init() calls atomic64_set() on an
atomic64_t which is only aligned to 32-bit, not 64-bit, because it
succeeds struct nft_expr in memory which only contains a 32-bit pointer.
Fix by aligning the nft_expr private data to 64-bit.
Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1b53d64435d56902fc234ff2507142d971a09687 ]
KCSAN reported the following data-race [1]
The fix will also prevent the compiler from optimizing out
the condition.
[1]
BUG: KCSAN: data-race in neigh_resolve_output / neigh_resolve_output
write to 0xffff8880a41dba78 of 8 bytes by interrupt on cpu 1:
neigh_event_send include/net/neighbour.h:443 [inline]
neigh_resolve_output+0x78/0x480 net/core/neighbour.c:1474
neigh_output include/net/neighbour.h:511 [inline]
ip_finish_output2+0x4af/0xe40 net/ipv4/ip_output.c:228
__ip_finish_output net/ipv4/ip_output.c:308 [inline]
__ip_finish_output+0x23a/0x490 net/ipv4/ip_output.c:290
ip_finish_output+0x41/0x160 net/ipv4/ip_output.c:318
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip_output+0xdf/0x210 net/ipv4/ip_output.c:432
dst_output include/net/dst.h:436 [inline]
ip_local_out+0x74/0x90 net/ipv4/ip_output.c:125
__ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
ip_queue_xmit+0x45/0x60 include/net/ip.h:237
__tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
__tcp_retransmit_skb+0x4bd/0x15f0 net/ipv4/tcp_output.c:2976
tcp_retransmit_skb+0x36/0x1a0 net/ipv4/tcp_output.c:2999
tcp_retransmit_timer+0x719/0x16d0 net/ipv4/tcp_timer.c:515
tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:598
tcp_write_timer+0xd1/0xf0 net/ipv4/tcp_timer.c:618
read to 0xffff8880a41dba78 of 8 bytes by interrupt on cpu 0:
neigh_event_send include/net/neighbour.h:442 [inline]
neigh_resolve_output+0x57/0x480 net/core/neighbour.c:1474
neigh_output include/net/neighbour.h:511 [inline]
ip_finish_output2+0x4af/0xe40 net/ipv4/ip_output.c:228
__ip_finish_output net/ipv4/ip_output.c:308 [inline]
__ip_finish_output+0x23a/0x490 net/ipv4/ip_output.c:290
ip_finish_output+0x41/0x160 net/ipv4/ip_output.c:318
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip_output+0xdf/0x210 net/ipv4/ip_output.c:432
dst_output include/net/dst.h:436 [inline]
ip_local_out+0x74/0x90 net/ipv4/ip_output.c:125
__ip_queue_xmit+0x3a8/0xa40 net/ipv4/ip_output.c:532
ip_queue_xmit+0x45/0x60 include/net/ip.h:237
__tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
__tcp_retransmit_skb+0x4bd/0x15f0 net/ipv4/tcp_output.c:2976
tcp_retransmit_skb+0x36/0x1a0 net/ipv4/tcp_output.c:2999
tcp_retransmit_timer+0x719/0x16d0 net/ipv4/tcp_timer.c:515
tcp_write_timer_handler+0x42d/0x510 net/ipv4/tcp_timer.c:598
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 55667441c84fa5e0911a0aac44fb059c15ba6da2 upstream.
UDP IPv6 packets auto flowlabels are using a 32bit secret
(static u32 hashrnd in net/core/flow_dissector.c) and
apply jhash() over fields known by the receivers.
Attackers can easily infer the 32bit secret and use this information
to identify a device and/or user, since this 32bit secret is only
set at boot time.
Really, using jhash() to generate cookies sent on the wire
is a serious security concern.
Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be
a dead end. Trying to periodically change the secret (like in sch_sfq.c)
could change paths taken in the network for long lived flows.
Let's switch to siphash, as we did in commit df453700e8d8
("inet: switch IP ID generator to siphash")
Using a cryptographically strong pseudo random function will solve this
privacy issue and more generally remove other weak points in the stack.
Packet schedulers using skb_get_hash_perturb() benefit from this change.
Fixes: b56774163f99 ("ipv6: Enable auto flow labels by default")
Fixes: 42240901f7c4 ("ipv6: Implement different admin modes for automatic flow labels")
Fixes: 67800f9b1f4e ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
Fixes: cb1ce2ef387b ("ipv6: Implement automatic flow label generation on transmit")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jonathan Berger <jonathann1@walla.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 20eb4f29b60286e0d6dc01d9c260b4bd383c58fb ]
sk_page_frag() optimizes skb_frag allocations by using per-task
skb_frag cache when it knows it's the only user. The condition is
determined by seeing whether the socket allocation mask allows
blocking - if the allocation may block, it obviously owns the task's
context and ergo exclusively owns current->task_frag.
Unfortunately, this misses recursion through memory reclaim path.
Please take a look at the following backtrace.
[2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10
...
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x30/0x40
sock_xmit.isra.24+0xa1/0x170 [nbd]
nbd_send_cmd+0x1d2/0x690 [nbd]
nbd_queue_rq+0x1b5/0x3b0 [nbd]
__blk_mq_try_issue_directly+0x108/0x1b0
blk_mq_request_issue_directly+0xbd/0xe0
blk_mq_try_issue_list_directly+0x41/0xb0
blk_mq_sched_insert_requests+0xa2/0xe0
blk_mq_flush_plug_list+0x205/0x2a0
blk_flush_plug_list+0xc3/0xf0
[1] blk_finish_plug+0x21/0x2e
_xfs_buf_ioapply+0x313/0x460
__xfs_buf_submit+0x67/0x220
xfs_buf_read_map+0x113/0x1a0
xfs_trans_read_buf_map+0xbf/0x330
xfs_btree_read_buf_block.constprop.42+0x95/0xd0
xfs_btree_lookup_get_block+0x95/0x170
xfs_btree_lookup+0xcc/0x470
xfs_bmap_del_extent_real+0x254/0x9a0
__xfs_bunmapi+0x45c/0xab0
xfs_bunmapi+0x15/0x30
xfs_itruncate_extents_flags+0xca/0x250
xfs_free_eofblocks+0x181/0x1e0
xfs_fs_destroy_inode+0xa8/0x1b0
destroy_inode+0x38/0x70
dispose_list+0x35/0x50
prune_icache_sb+0x52/0x70
super_cache_scan+0x120/0x1a0
do_shrink_slab+0x120/0x290
shrink_slab+0x216/0x2b0
shrink_node+0x1b6/0x4a0
do_try_to_free_pages+0xc6/0x370
try_to_free_mem_cgroup_pages+0xe3/0x1e0
try_charge+0x29e/0x790
mem_cgroup_charge_skmem+0x6a/0x100
__sk_mem_raise_allocated+0x18e/0x390
__sk_mem_schedule+0x2a/0x40
[0] tcp_sendmsg_locked+0x8eb/0xe10
tcp_sendmsg+0x27/0x40
sock_sendmsg+0x30/0x40
___sys_sendmsg+0x26d/0x2b0
__sys_sendmsg+0x57/0xa0
do_syscall_64+0x42/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
In [0], tcp_send_msg_locked() was using current->page_frag when it
called sk_wmem_schedule(). It already calculated how many bytes can
be fit into current->page_frag. Due to memory pressure,
sk_wmem_schedule() called into memory reclaim path which called into
xfs and then IO issue path. Because the filesystem in question is
backed by nbd, the control goes back into the tcp layer - back into
tcp_sendmsg_locked().
nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes
sense - it's in the process of freeing memory and wants to be able to,
e.g., drop clean pages to make forward progress. However, this
confused sk_page_frag() called from [2]. Because it only tests
whether the allocation allows blocking which it does, it now thinks
current->page_frag can be used again although it already was being
used in [0].
After [2] used current->page_frag, the offset would be increased by
the used amount. When the control returns to [0],
current->page_frag's offset is increased and the previously calculated
number of bytes now may overrun the end of allocated memory leading to
silent memory corruptions.
Fix it by adding gfpflags_normal_context() which tests sleepable &&
!reclaim and use it to determine whether to use current->task_frag.
v2: Eric didn't like gfp flags being tested twice. Introduce a new
helper gfpflags_normal_context() and combine the two tests.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 644fbdeacf1d3edd366e44b8ba214de9d1dd66a9 upstream.
Now sctp uses inet_dgram_connect as its proto_ops .connect, and the flags
param can't be passed into its proto .connect where this flags is really
needed.
sctp works around it by getting flags from socket file in __sctp_connect.
It works for connecting from userspace, as inherently the user sock has
socket file and it passes f_flags as the flags param into the proto_ops
.connect.
However, the sock created by sock_create_kern doesn't have a socket file,
and it passes the flags (like O_NONBLOCK) by using the flags param in
kernel_connect, which calls proto_ops .connect later.
So to fix it, this patch defines a new proto_ops .connect for sctp,
sctp_inet_connect, which calls __sctp_connect() directly with this
flags param. After this, the sctp's proto .connect can be removed.
Note that sctp_inet_connect doesn't need to do some checks that are not
needed for sctp, which makes thing better than with inet_dgram_connect.
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 159d2c7d8106177bd9a986fd005a311fe0d11285 upstream.
qdisc_root() use from netem_enqueue() triggers a lockdep warning.
__dev_queue_xmit() uses rcu_read_lock_bh() which is
not equivalent to rcu_read_lock() + local_bh_disable_bh as far
as lockdep is concerned.
WARNING: suspicious RCU usage
5.3.0-rc7+ #0 Not tainted
-----------------------------
include/net/sch_generic.h:492 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
3 locks held by syz-executor427/8855:
#0: 00000000b5525c01 (rcu_read_lock_bh){....}, at: lwtunnel_xmit_redirect include/net/lwtunnel.h:92 [inline]
#0: 00000000b5525c01 (rcu_read_lock_bh){....}, at: ip_finish_output2+0x2dc/0x2570 net/ipv4/ip_output.c:214
#1: 00000000b5525c01 (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x20a/0x3650 net/core/dev.c:3804
#2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: spin_lock include/linux/spinlock.h:338 [inline]
#2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: __dev_xmit_skb net/core/dev.c:3502 [inline]
#2: 00000000364bae92 (&(&sch->q.lock)->rlock){+.-.}, at: __dev_queue_xmit+0x14b8/0x3650 net/core/dev.c:3838
stack backtrace:
CPU: 0 PID: 8855 Comm: syz-executor427 Not tainted 5.3.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5357
qdisc_root include/net/sch_generic.h:492 [inline]
netem_enqueue+0x1cfb/0x2d80 net/sched/sch_netem.c:479
__dev_xmit_skb net/core/dev.c:3527 [inline]
__dev_queue_xmit+0x15d2/0x3650 net/core/dev.c:3838
dev_queue_xmit+0x18/0x20 net/core/dev.c:3902
neigh_hh_output include/net/neighbour.h:500 [inline]
neigh_output include/net/neighbour.h:509 [inline]
ip_finish_output2+0x1726/0x2570 net/ipv4/ip_output.c:228
__ip_finish_output net/ipv4/ip_output.c:308 [inline]
__ip_finish_output+0x5fc/0xb90 net/ipv4/ip_output.c:290
ip_finish_output+0x38/0x1f0 net/ipv4/ip_output.c:318
NF_HOOK_COND include/linux/netfilter.h:294 [inline]
ip_mc_output+0x292/0xf40 net/ipv4/ip_output.c:417
dst_output include/net/dst.h:436 [inline]
ip_local_out+0xbb/0x190 net/ipv4/ip_output.c:125
ip_send_skb+0x42/0xf0 net/ipv4/ip_output.c:1555
udp_send_skb.isra.0+0x6b2/0x1160 net/ipv4/udp.c:887
udp_sendmsg+0x1e96/0x2820 net/ipv4/udp.c:1174
inet_sendmsg+0x9e/0xe0 net/ipv4/af_inet.c:807
sock_sendmsg_nosec net/socket.c:637 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:657
___sys_sendmsg+0x3e2/0x920 net/socket.c:2311
__sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413
__do_sys_sendmmsg net/socket.c:2442 [inline]
__se_sys_sendmmsg net/socket.c:2439 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439
do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b74555de21acd791f12c4a1aeaf653dd7ac21133 upstream.
syzbot reported:
BUG: memory leak
unreferenced object 0xffff88811eb3de00 (size 224):
comm "syz-executor559", pid 7315, jiffies 4294943019 (age 10.300s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 a0 38 24 81 88 ff ff 00 c0 f2 15 81 88 ff ff ..8$............
backtrace:
[<000000008d1c66a1>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
[<000000008d1c66a1>] slab_post_alloc_hook mm/slab.h:439 [inline]
[<000000008d1c66a1>] slab_alloc_node mm/slab.c:3269 [inline]
[<000000008d1c66a1>] kmem_cache_alloc_node+0x153/0x2a0 mm/slab.c:3579
[<00000000447d9496>] __alloc_skb+0x6e/0x210 net/core/skbuff.c:198
[<000000000cdbf82f>] alloc_skb include/linux/skbuff.h:1058 [inline]
[<000000000cdbf82f>] llc_alloc_frame+0x66/0x110 net/llc/llc_sap.c:54
[<000000002418b52e>] llc_conn_ac_send_sabme_cmd_p_set_x+0x2f/0x140 net/llc/llc_c_ac.c:777
[<000000001372ae17>] llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline]
[<000000001372ae17>] llc_conn_service net/llc/llc_conn.c:400 [inline]
[<000000001372ae17>] llc_conn_state_process+0x1ac/0x640 net/llc/llc_conn.c:75
[<00000000f27e53c1>] llc_establish_connection+0x110/0x170 net/llc/llc_if.c:109
[<00000000291b2ca0>] llc_ui_connect+0x10e/0x370 net/llc/af_llc.c:477
[<000000000f9c740b>] __sys_connect+0x11d/0x170 net/socket.c:1840
[...]
The bug is that most callers of llc_conn_send_pdu() assume it consumes a
reference to the skb, when actually due to commit b85ab56c3f81 ("llc:
properly handle dev_queue_xmit() return value") it doesn't.
Revert most of that commit, and instead make the few places that need
llc_conn_send_pdu() to *not* consume a reference call skb_get() before.
Fixes: b85ab56c3f81 ("llc: properly handle dev_queue_xmit() return value")
Reported-by: syzbot+6b825a6494a04cc0e3f7@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit dbb2483b2a46fbaf833cfb5deb5ed9cace9c7399 upstream.
In commit 6a53b7593233 ("xfrm: check id proto in validate_tmpl()")
I introduced a check for xfrm protocol, but according to Herbert
IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so
it should be removed from validate_tmpl().
And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific
protocols, this is why xfrm_state_flush() could still miss
IPPROTO_ROUTING, which leads that those entries are left in
net->xfrm.state_all before exit net. Fix this by replacing
IPSEC_PROTO_ANY with zero.
This patch also extracts the check from validate_tmpl() to
xfrm_id_proto_valid() and uses it in parse_ipsecrequest().
With this, no other protocols should be added into xfrm.
Fixes: 6a53b7593233 ("xfrm: check id proto in validate_tmpl()")
Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
triggers following stack trace:
[25244.848046] kernel BUG at ./include/linux/skbuff.h:1406!
[25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc
[25244.888167] Call Trace:
[25244.889182] <IRQ>
[25244.890001] tcp_fragment+0x9c/0x2cf
[25244.891295] tcp_write_xmit+0x68f/0x988
[25244.892732] __tcp_push_pending_frames+0x3b/0xa0
[25244.894347] tcp_data_snd_check+0x2a/0xc8
[25244.895775] tcp_rcv_established+0x2a8/0x30d
[25244.897282] tcp_v4_do_rcv+0xb2/0x158
[25244.898666] tcp_v4_rcv+0x692/0x956
[25244.899959] ip_local_deliver_finish+0xeb/0x169
[25244.901547] __netif_receive_skb_core+0x51c/0x582
[25244.903193] ? inet_gro_receive+0x239/0x247
[25244.904756] netif_receive_skb_internal+0xab/0xc6
[25244.906395] napi_gro_receive+0x8a/0xc0
[25244.907760] receive_buf+0x9a1/0x9cd
[25244.909160] ? load_balance+0x17a/0x7b7
[25244.910536] ? vring_unmap_one+0x18/0x61
[25244.911932] ? detach_buf+0x60/0xfa
[25244.913234] virtnet_poll+0x128/0x1e1
[25244.914607] net_rx_action+0x12a/0x2b1
[25244.915953] __do_softirq+0x11c/0x26b
[25244.917269] ? handle_irq_event+0x44/0x56
[25244.918695] irq_exit+0x61/0xa0
[25244.919947] do_IRQ+0x9d/0xbb
[25244.921065] common_interrupt+0x85/0x85
[25244.922479] </IRQ>
tcp_rtx_queue_tail() (called by tcp_fragment()) can call
tcp_write_queue_prev() on the first packet in the queue, which will trigger
the BUG in tcp_write_queue_prev(), because there is no previous packet.
This happens when the retransmit queue is empty, for example in case of a
zero window.
Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()") was not a
simple cherry-pick of the original one from master (b617158dc096)
because there is a specific TCP rtx queue only since v4.15. For more
details, please see the commit message of b617158dc096 ("tcp: be more
careful in tcp_fragment()").
The BUG() is hit due to the specific code added to versions older than
v4.15. The comment in skb_queue_prev() (include/linux/skbuff.h:1406),
just before the BUG_ON() somehow suggests to add a check before using
it, what Tim did.
In master, this code path causing the issue will not be taken because
the implementation of tcp_rtx_queue_tail() is different:
tcp_fragment() → tcp_rtx_queue_tail() → tcp_write_queue_prev() →
skb_queue_prev() → BUG_ON()
Fixes: 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 3c79107631db1f7fd32cf3f7368e4672004a3010 upstream.
else, we leak the addresses to userspace via ctnetlink events
and dumps.
Compute an ID on demand based on the immutable parts of nf_conn struct.
Another advantage compared to using an address is that there is no
immediate re-use of the same ID in case the conntrack entry is freed and
reallocated again immediately.
Fixes: 3583240249ef ("[NETFILTER]: nf_conntrack_expect: kill unique ID")
Fixes: 7f85f914721f ("[NETFILTER]: nf_conntrack: kill unique ID")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit df453700e8d81b1bdafdf684365ee2b9431fb702 upstream.
According to Amit Klein and Benny Pinkas, IP ID generation is too weak
and might be used by attackers.
Even with recent net_hash_mix() fix (netns: provide pure entropy for net_hash_mix())
having 64bit key and Jenkins hash is risky.
It is time to switch to siphash and its 128bit keys.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b617158dc096709d8600c53b6052144d12b89fab ]
Some applications set tiny SO_SNDBUF values and expect
TCP to just work. Recent patches to address CVE-2019-11478
broke them in case of losses, since retransmits might
be prevented.
We should allow these flows to make progress.
This patch allows the first and last skb in retransmit queue
to be split even if memory limits are hit.
It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size). Note this allowance was already present
in stable backports for kernels < 4.15
Note for < 4.15 backports :
tcp_rtx_queue_tail() will probably look like :
static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
{
struct sk_buff *skb = tcp_send_head(sk);
return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
}
Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Andrew Prout <aprout@ll.mit.edu>
Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Christoph Paasch <cpaasch@apple.com>
Cc: Jonathan Looney <jtl@netflix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dbbf2d1e4077bab0c65ece2765d3fc69cf7d610f ]
tcp_write_queue_purge clears all the SKBs in the write queue
but does not reset the sk_send_head. As a result, we can have
a NULL pointer dereference anywhere that we use tcp_send_head
instead of the tcp_write_queue_tail.
For example, after a27fd7a8ed38 (tcp: purge write queue upon RST),
we can purge the write queue on RST. Prior to
75c119afe14f (tcp: implement rb-tree based retransmit queue),
tcp_push will only check tcp_send_head and then accesses
tcp_write_queue_tail to send the actual SKB. As a result, it will
dereference a NULL pointer.
This has been reported twice for 4.14 where we don't have
75c119afe14f:
By Timofey Titovets:
[ 422.081094] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000038
[ 422.081254] IP: tcp_push+0x42/0x110
[ 422.081314] PGD 0 P4D 0
[ 422.081364] Oops: 0002 [#1] SMP PTI
By Yongjian Xu:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: tcp_push+0x48/0x120
PGD 80000007ff77b067 P4D 80000007ff77b067 PUD 7fd989067 PMD 0
Oops: 0002 [#18] SMP PTI
Modules linked in: tcp_diag inet_diag tcp_bbr sch_fq iTCO_wdt
iTCO_vendor_support pcspkr ixgbe mdio i2c_i801 lpc_ich joydev input_leds shpchp
e1000e igb dca ptp pps_core hwmon mei_me mei ipmi_si ipmi_msghandler sg ses
scsi_transport_sas enclosure ext4 jbd2 mbcache sd_mod ahci libahci megaraid_sas
wmi ast ttm dm_mirror dm_region_hash dm_log dm_mod dax
CPU: 6 PID: 14156 Comm: [ET_NET 6] Tainted: G D 4.14.26-1.el6.x86_64 #1
Hardware name: LENOVO ThinkServer RD440 /ThinkServer RD440, BIOS A0TS80A
09/22/2014
task: ffff8807d78d8140 task.stack: ffffc9000e944000
RIP: 0010:tcp_push+0x48/0x120
RSP: 0018:ffffc9000e947a88 EFLAGS: 00010246
RAX: 00000000000005b4 RBX: ffff880f7cce9c00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000040 RDI: ffff8807d00f5000
RBP: ffffc9000e947aa8 R08: 0000000000001c84 R09: 0000000000000000
R10: ffff8807d00f5158 R11: 0000000000000000 R12: ffff8807d00f5000
R13: 0000000000000020 R14: 00000000000256d4 R15: 0000000000000000
FS: 00007f5916de9700(0000) GS:ffff88107fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000038 CR3: 00000007f8226004 CR4: 00000000001606e0
Call Trace:
tcp_sendmsg_locked+0x33d/0xe50
tcp_sendmsg+0x37/0x60
inet_sendmsg+0x39/0xc0
sock_sendmsg+0x49/0x60
sock_write_iter+0xb6/0x100
do_iter_readv_writev+0xec/0x130
? rw_verify_area+0x49/0xb0
do_iter_write+0x97/0xd0
vfs_writev+0x7e/0xe0
? __wake_up_common_lock+0x80/0xa0
? __fget_light+0x2c/0x70
? __do_page_fault+0x1e7/0x530
do_writev+0x60/0xf0
? inet_shutdown+0xac/0x110
SyS_writev+0x10/0x20
do_syscall_64+0x6f/0x140
? prepare_exit_to_usermode+0x8b/0xa0
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x3135ce0c57
RSP: 002b:00007f5916de4b00 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000003135ce0c57
RDX: 0000000000000002 RSI: 00007f5916de4b90 RDI: 000000000000606f
RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f5916de8c38
R10: 0000000000000000 R11: 0000000000000293 R12: 00000000000464cc
R13: 00007f5916de8c30 R14: 00007f58d8bef080 R15: 0000000000000002
Code: 48 8b 97 60 01 00 00 4c 8d 97 58 01 00 00 41 b9 00 00 00 00 41 89 f3 4c 39
d2 49 0f 44 d1 41 81 e3 00 80 00 00 0f 85 b0 00 00 00 <80> 4a 38 08 44 8b 8f 74
06 00 00 44 89 8f 7c 06 00 00 83 e6 01
RIP: tcp_push+0x48/0x120 RSP: ffffc9000e947a88
CR2: 0000000000000038
---[ end trace 8d545c2e93515549 ]---
There is other scenario which found in stable 4.4:
Allocated:
[<ffffffff82f380a6>] __alloc_skb+0xe6/0x600 net/core/skbuff.c:218
[<ffffffff832466c3>] alloc_skb_fclone include/linux/skbuff.h:856 [inline]
[<ffffffff832466c3>] sk_stream_alloc_skb+0xa3/0x5d0 net/ipv4/tcp.c:833
[<ffffffff83249164>] tcp_sendmsg+0xd34/0x2b00 net/ipv4/tcp.c:1178
[<ffffffff83300ef3>] inet_sendmsg+0x203/0x4d0 net/ipv4/af_inet.c:755
Freed:
[<ffffffff82f372fd>] __kfree_skb+0x1d/0x20 net/core/skbuff.c:676
[<ffffffff83288834>] sk_wmem_free_skb include/net/sock.h:1447 [inline]
[<ffffffff83288834>] tcp_write_queue_purge include/net/tcp.h:1460 [inline]
[<ffffffff83288834>] tcp_connect_init net/ipv4/tcp_output.c:3122 [inline]
[<ffffffff83288834>] tcp_connect+0xb24/0x30c0 net/ipv4/tcp_output.c:3261
[<ffffffff8329b991>] tcp_v4_connect+0xf31/0x1890 net/ipv4/tcp_ipv4.c:246
BUG: KASAN: use-after-free in tcp_skb_pcount include/net/tcp.h:796 [inline]
BUG: KASAN: use-after-free in tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
BUG: KASAN: use-after-free in tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
[<ffffffff81515cd5>] kasan_report.cold.7+0x175/0x2f7 mm/kasan/report.c:408
[<ffffffff814f9784>] __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:427
[<ffffffff83286582>] tcp_skb_pcount include/net/tcp.h:796 [inline]
[<ffffffff83286582>] tcp_init_tso_segs net/ipv4/tcp_output.c:1619 [inline]
[<ffffffff83286582>] tcp_write_xmit+0x3fc2/0x4cb0 net/ipv4/tcp_output.c:2056
[<ffffffff83287a40>] __tcp_push_pending_frames+0xa0/0x290 net/ipv4/tcp_output.c:2307
stable 4.4 and stable 4.9 don't have the commit abb4a8b870b5 ("tcp: purge write queue upon RST")
which is referred in dbbf2d1e4077,
in tcp_connect_init, it calls tcp_write_queue_purge, and does not reset sk_send_head, then UAF.
stable 4.14 have the commit abb4a8b870b5 ("tcp: purge write queue upon RST"),
in tcp_reset, it calls tcp_write_queue_purge(sk), and does not reset sk_send_head, then UAF.
So this patch can be used to fix stable 4.4 and 4.9.
Fixes: a27fd7a8ed38 (tcp: purge write queue upon RST)
Reported-by: Timofey Titovets <nefelim4ag@gmail.com>
Reported-by: Yongjian Xu <yongjianchn@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Tested-by: Yongjian Xu <yongjianchn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
init_dummy_netdev() leaves its netdev_ops pointer zeroed. This leads
to a NULL pointer dereference when sk_busy_loop fires against an iwlwifi
wireless adapter and checks napi->dev->netdev_ops->ndo_busy_poll.
Avoid this by ensuring napi->dev->netdev_ops is valid before following
the pointer, avoiding the following panic when busy polling on a dummy
netdev:
BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
IP: [<ffffffff817b4b72>] sk_busy_loop+0x92/0x2f0
Call Trace:
[<ffffffff815a3134>] ? uart_write_room+0x74/0xf0
[<ffffffff817964a9>] sock_poll+0x99/0xa0
[<ffffffff81223142>] do_sys_poll+0x2e2/0x520
[<ffffffff8118d3fc>] ? get_page_from_freelist+0x3bc/0xa30
[<ffffffff810ada22>] ? update_curr+0x62/0x140
[<ffffffff811ea671>] ? __slab_free+0xa1/0x2a0
[<ffffffff811ea671>] ? __slab_free+0xa1/0x2a0
[<ffffffff8179dbb1>] ? skb_free_head+0x21/0x30
[<ffffffff81221bd0>] ? poll_initwait+0x50/0x50
[<ffffffff811eaa36>] ? kmem_cache_free+0x1c6/0x1e0
[<ffffffff815a4884>] ? uart_write+0x124/0x1d0
[<ffffffff810bd1cd>] ? remove_wait_queue+0x4d/0x60
[<ffffffff810bd224>] ? __wake_up+0x44/0x50
[<ffffffff81582731>] ? tty_write_unlock+0x31/0x40
[<ffffffff8158c5c6>] ? tty_ldisc_deref+0x16/0x20
[<ffffffff81584820>] ? tty_write+0x1e0/0x2f0
[<ffffffff81587e50>] ? process_echoes+0x80/0x80
[<ffffffff8120c17b>] ? __vfs_write+0x2b/0x130
[<ffffffff8120d09a>] ? vfs_write+0x15a/0x1a0
[<ffffffff81223455>] SyS_poll+0x75/0x100
[<ffffffff819a6524>] entry_SYSCALL_64_fastpath+0x24/0xcf
Commit 79e7fff47b7b ("net: remove support for per driver ndo_busy_poll()")
indirectly fixed this upstream in linux-4.11 by removing the offending
pointer usage. No other users of napi->dev touch its netdev_ops.
Fixes: 8b80cda536ea ("net: rename include/net/ll_poll.h to include/net/busy_poll.h") # 4.4.y
Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d5bb334a8e171b262e48f378bd2096c0ea458265 upstream.
The minimum encryption key size for LE connections is 56 bits and to
align LE with BR/EDR, enforce 56 bits of minimum encryption key size for
BR/EDR connections as well.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|