| Age | Commit message (Collapse) | Author |
|
[ Upstream commit a05754295e01f006a651eec759c5dbe682ef6cef ]
Move to using refcount_t rather than atomic_t for refcounts in rxrpc.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 3bcd6c7eaa53 ("rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975]")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 986d93f55bdeab1cac858d1e47b41fac10b2d7f6 ]
Shifting signed 32-bit value by 31 bits is undefined, so changing
significant bit to unsigned. The UBSAN warning calltrace like below:
UBSAN: shift-out-of-bounds in kernel/auditfilter.c:179:23
left shift of 1 by 31 places cannot be represented in type 'int'
Call Trace:
<TASK>
dump_stack_lvl+0x7d/0xa5
dump_stack+0x15/0x1b
ubsan_epilogue+0xe/0x4e
__ubsan_handle_shift_out_of_bounds+0x1e7/0x20c
audit_register_class+0x9d/0x137
audit_classes_init+0x4d/0xb8
do_one_initcall+0x76/0x430
kernel_init_freeable+0x3b3/0x422
kernel_init+0x24/0x1e0
ret_from_fork+0x1f/0x30
</TASK>
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
[PM: remove bad 'Fixes' tag as issue predates git, added in v2.6.6-rc1]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8925c31c1ac2f1e05da988581f2a70a2a8c4d638 ]
Preparing to move serial_rs485 struct sanitization into serial core,
each driver has to provide what fields/flags it supports. This
information is pointed into by rs485_supported.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20220606100433.13793-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: 76bad3f88750 ("tty: serial: fsl_lpuart: don't break the on-going transfer when global reset")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit e3e6e1d16a4cf7b63159ec71774e822194071954 upstream.
Syzkaller reports buffer overflow false positive as follows:
------------[ cut here ]------------
memcpy: detected field-spanning write (size 8) of single field
"&compat_event->pointer" at net/wireless/wext-core.c:623 (size 4)
WARNING: CPU: 0 PID: 3607 at net/wireless/wext-core.c:623
wireless_send_event+0xab5/0xca0 net/wireless/wext-core.c:623
Modules linked in:
CPU: 1 PID: 3607 Comm: syz-executor659 Not tainted
6.0.0-rc6-next-20220921-syzkaller #0
[...]
Call Trace:
<TASK>
ioctl_standard_call+0x155/0x1f0 net/wireless/wext-core.c:1022
wireless_process_ioctl+0xc8/0x4c0 net/wireless/wext-core.c:955
wext_ioctl_dispatch net/wireless/wext-core.c:988 [inline]
wext_ioctl_dispatch net/wireless/wext-core.c:976 [inline]
wext_handle_ioctl+0x26b/0x280 net/wireless/wext-core.c:1049
sock_ioctl+0x285/0x640 net/socket.c:1220
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[...]
</TASK>
Wireless events will be sent on the appropriate channels in
wireless_send_event(). Different wireless events may have different
payload structure and size, so kernel uses **len** and **cmd** field
in struct __compat_iw_event as wireless event common LCP part, uses
**pointer** as a label to mark the position of remaining different part.
Yet the problem is that, **pointer** is a compat_caddr_t type, which may
be smaller than the relative structure at the same position. So during
wireless_send_event() tries to parse the wireless events payload, it may
trigger the memcpy() run-time destination buffer bounds checking when the
relative structure's data is copied to the position marked by **pointer**.
This patch solves it by introducing flexible-array field **ptr_bytes**,
to mark the position of the wireless events remaining part next to
LCP part. What's more, this patch also adds **ptr_len** variable in
wireless_send_event() to improve its maintainability.
Reported-and-tested-by: syzbot+473754e5af963cf014cf@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/00000000000070db2005e95a5984@google.com/
Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 58e0be1ef6118c5352b56a4d06e974c5599993a5 ]
kernel test robot reported warnings when build bonding module with
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/bonding/:
from ../drivers/net/bonding/bond_main.c:35:
In function ‘fortify_memcpy_chk’,
inlined from ‘iph_to_flow_copy_v4addrs’ at ../include/net/ip.h:566:2,
inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3984:3:
../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
413 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘fortify_memcpy_chk’,
inlined from ‘iph_to_flow_copy_v6addrs’ at ../include/net/ipv6.h:900:2,
inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3994:3:
../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
413 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is because we try to copy the whole ip/ip6 address to the flow_key,
while we only point the to ip/ip6 saddr. Note that since these are UAPI
headers, __struct_group() is used to avoid the compiler warnings.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: c3f8324188fa ("net: Add full IPv6 addresses to flow_keys")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20221115142400.1204786-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bedf06833b1f63c2627bd5634602e05592129d7a ]
Move the declaration of 'struct trace_array' out of #ifdef
CONFIG_TRACING block, to fix the following warning when CONFIG_TRACING
is not set:
>> include/linux/trace.h:63:45: warning: 'struct trace_array' declared
inside parameter list will not be visible outside of this definition or
declaration
Link: https://lkml.kernel.org/r/20221107160556.2139463-1-shraash@google.com
Fixes: 1a77dd1c2bb5 ("scsi: tracing: Fix compile error in trace_array calls when TRACING is disabled")
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Arun Easi <aeasi@marvell.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Aashish Sharma <shraash@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 42fb0a1e84ff525ebe560e2baf9451ab69127e2b upstream.
Currently the way polling works on the ring buffer is broken. It will
return immediately if there's any data in the ring buffer whereas a read
will block until the watermark (defined by the tracefs buffer_percent file)
is hit.
That is, a select() or poll() will return as if there's data available,
but then the following read will block. This is broken for the way
select()s and poll()s are supposed to work.
Have the polling on the ring buffer also block the same way reads and
splice does on the ring buffer.
Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home
Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Primiano Tucci <primiano@google.com>
Cc: stable@vger.kernel.org
Fixes: 1e0d6714aceb7 ("ring-buffer: Do not wake up a splice waiter when page is not full")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit cef8cdc0d0e7c701fe4dcfba4ed3fd25d28a6020 ]
Do not soft reset AC DLL as controller is buggy and this operation my
introduce glitches in the controller leading to undefined behavior.
Fixes: f0bbf17958e8 ("ARM: at91: pm: add self-refresh support for sama7g5")
Depends-on: a02875c4cbd6 ("ARM: at91: pm: fix self-refresh for sama7g5")
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20221026124114.985876-2-claudiu.beznea@microchip.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit cd149eff8d2201a63c074a6d9d03e52926aa535d ]
Currently the driver tries to disable the BIOS write protection
automatically even if this is not what the user wants. For this reason
modify the driver so that by default it does not touch the write
protection. Only if specifically asked by the user (setting writeable=1
command line parameter) the driver tries to disable the BIOS write
protection.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Mauro Lima <mauro.lima@eclypsium.com>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Acked-by: Lee Jones <lee.jones@linaro.org>
Link: https://lore.kernel.org/r/20220209122706.42439-2-mika.westerberg@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Stable-dep-of: 92a66cbf6b30 ("spi: intel: Use correct mask for flash and protected regions")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 6312d52838b21f5c4a5afa1269a00df4364fd354 upstream.
Building an allmodconfig kernel arm64 kernel, the following build error
shows up:
In file included from drivers/crypto/marvell/octeontx2/cn10k_cpt.c:4:
include/linux/soc/marvell/octeontx2/asm.h:38:15: error: unknown type name 'u64'
38 | static inline u64 otx2_atomic64_fetch_add(u64 incr, u64 *ptr)
| ^~~
Include linux/types.h in asm.h so the compiler knows what the type
'u64' are.
Fixes: af3826db74d1 ("octeontx2-pf: Use hardware register for CQE count")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/r/20211013135743.3826594-1-anders.roxell@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 000f8870a47bdc36730357883b6aef42bced91ee upstream.
Commit d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP")
fixed an orphan section warning by adding the '.data..decrypted' section
to the linker script under the PERCPU_DECRYPTED_SECTION define but that
placement introduced a panic with !SMP, as the percpu sections are not
instantiated with that configuration so attempting to access variables
defined with DEFINE_PER_CPU_DECRYPTED() will result in a page fault.
Move the '.data..decrypted' section to the DATA_MAIN define so that the
variables in it are properly instantiated at boot time with
CONFIG_SMP=n.
Cc: stable@vger.kernel.org
Fixes: d4c639990036 ("vmlinux.lds.h: Avoid orphan section with !SMP")
Link: https://lore.kernel.org/cbbd3548-880c-d2ca-1b67-5bb93b291d5f@huawei.com/
Debugged-by: Ard Biesheuvel <ardb@kernel.org>
Reported-by: Zhao Wenhui <zhaowenhui8@huawei.com>
Tested-by: xiafukun <xiafukun@huawei.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221108174934.3384275-1-nathan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 46653972e3ea64f79e7f8ae3aa41a4d3fdb70a13 ]
Shifting signed 32-bit value by 31 bits is undefined, so changing
significant bit to unsigned. The UBSAN warning calltrace like below:
UBSAN: shift-out-of-bounds in security/commoncap.c:1252:2
left shift of 1 by 31 places cannot be represented in type 'int'
Call Trace:
<TASK>
dump_stack_lvl+0x7d/0xa5
dump_stack+0x15/0x1b
ubsan_epilogue+0xe/0x4e
__ubsan_handle_shift_out_of_bounds+0x1e7/0x20c
cap_task_prctl+0x561/0x6f0
security_task_prctl+0x5a/0xb0
__x64_sys_prctl+0x61/0x8f0
do_syscall_64+0x58/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>
Fixes: e338d263a76a ("Add 64-bit capability support to the kernel")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit af3826db74d184bc9c2c9d3ff34548e5f317a6f3 ]
Current driver uses software CQ head pointer to poll on CQE
header in memory to determine if CQE is valid. Software needs
to make sure, that the reads of the CQE do not get re-ordered
so much that it ends up with an inconsistent view of the CQE.
To ensure that DMB barrier after read to first CQE cacheline
and before reading of the rest of the CQE is needed.
But having barrier for every CQE read will impact the performance,
instead use hardware CQ head and tail pointers to find the
valid number of CQEs.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: 51afe9026d0c ("octeontx2-pf: NIX TX overwrites SQ_CTX_HW_S[SQ_INT]")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b239da34203f49c40b5d656220c39647c3ff0b3c ]
For a lot of use cases in future patches, we will want to modify the
state of registers part of some same 'group' (e.g. same ref_obj_id). It
won't just be limited to releasing reference state, but setting a type
flag dynamically based on certain actions, etc.
Hence, we need a way to easily pass a callback to the function that
iterates over all registers in current bpf_verifier_state in all frames
upto (and including) the curframe.
While in C++ we would be able to easily use a lambda to pass state and
the callback together, sadly we aren't using C++ in the kernel. The next
best thing to avoid defining a function for each case seems like
statement expressions in GNU C. The kernel already uses them heavily,
hence they can passed to the macro in the style of a lambda. The
statement expression will then be substituted in the for loop bodies.
Variables __state and __reg are set to current bpf_func_state and reg
for each invocation of the expression inside the passed in verifier
state.
Then, convert mark_ptr_or_null_regs, clear_all_pkt_pointers,
release_reference, find_good_pkt_pointers, find_equal_scalars to
use bpf_for_each_reg_in_vstate.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220904204145.3089-16-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Stable-dep-of: f1db20814af5 ("bpf: Fix wrong reg type conversion in release_reference()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8bbabb3fddcd0f858be69ed5abc9b470a239d6f2 ]
Stanislav reported a lockdep warning, which is caused by the
cancel_work_sync() called inside sock_map_close(), as analyzed
below by Jakub:
psock->work.func = sk_psock_backlog()
ACQUIRE psock->work_mutex
sk_psock_handle_skb()
skb_send_sock()
__skb_send_sock()
sendpage_unlocked()
kernel_sendpage()
sock->ops->sendpage = inet_sendpage()
sk->sk_prot->sendpage = tcp_sendpage()
ACQUIRE sk->sk_lock
tcp_sendpage_locked()
RELEASE sk->sk_lock
RELEASE psock->work_mutex
sock_map_close()
ACQUIRE sk->sk_lock
sk_psock_stop()
sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED)
cancel_work_sync()
__cancel_work_timer()
__flush_work()
// wait for psock->work to finish
RELEASE sk->sk_lock
We can move the cancel_work_sync() out of the sock lock protection,
but still before saved_close() was called.
Fixes: 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20221102043417.279409-1-xiyou.wangcong@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d8616ee2affcff37c5d315310da557a694a3303d ]
During TCP sockmap redirect pressure test, the following warning is triggered:
WARNING: CPU: 3 PID: 2145 at net/core/stream.c:205 sk_stream_kill_queues+0xbc/0xd0
CPU: 3 PID: 2145 Comm: iperf Kdump: loaded Tainted: G W 5.10.0+ #9
Call Trace:
inet_csk_destroy_sock+0x55/0x110
inet_csk_listen_stop+0xbb/0x380
tcp_close+0x41b/0x480
inet_release+0x42/0x80
__sock_release+0x3d/0xa0
sock_close+0x11/0x20
__fput+0x9d/0x240
task_work_run+0x62/0x90
exit_to_user_mode_prepare+0x110/0x120
syscall_exit_to_user_mode+0x27/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason we observed is that:
When the listener is closing, a connection may have completed the three-way
handshake but not accepted, and the client has sent some packets. The child
sks in accept queue release by inet_child_forget()->inet_csk_destroy_sock(),
but psocks of child sks have not released.
To fix, add sock_map_destroy to release psocks.
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220524075311.649153-1-wangyufen@huawei.com
Stable-dep-of: 8bbabb3fddcd ("bpf, sock_map: Move cancel_work_sync() out of sock lock")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 161a438d730dade2ba2b1bf8785f0759aba4ca5f upstream.
We no longer need at least 64 bytes of random seed to permit the early
crng init to complete. The RNG is now based on Blake2s, so reduce the
EFI seed size to the Blake2s hash size, which is sufficient for our
purposes.
While at it, drop the READ_ONCE(), which was supposed to prevent size
from being evaluated after seed was unmapped. However, this cannot
actually happen, so READ_ONCE() is unnecessary here.
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 11052589cf5c0bab3b4884d423d5f60c38fcf25d upstream.
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made
it possible to enable/disable early_demux on a per-netns basis. Then, we
introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for
TCP/UDP in commit dddb64bcb346 ("net: Add sysctl to toggle early demux for
tcp and udp"). However, the .proc_handler() was wrong and actually
disabled us from changing the behaviour in each netns.
We can execute early_demux if net.ipv4.ip_early_demux is on and each proto
.early_demux() handler is not NULL. When we toggle (tcp|udp)_early_demux,
the change itself is saved in each netns variable, but the .early_demux()
handler is a global variable, so the handler is switched based on the
init_net's sysctl variable. Thus, netns (tcp|udp)_early_demux knobs have
nothing to do with the logic. Whether we CAN execute proto .early_demux()
is always decided by init_net's sysctl knob, and whether we DO it or not is
by each netns ip_early_demux knob.
This patch namespacifies (tcp|udp)_early_demux again. For now, the users
of the .early_demux() handler are TCP and UDP only, and they are called
directly to avoid retpoline. So, we can remove the .early_demux() handler
from inet6?_protos and need not dereference them in ip6?_rcv_finish_core().
If another proto needs .early_demux(), we can restore it at that time.
Fixes: dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ccd30a476f8e864732de220bd50e6f372f5ebcab upstream.
Commit d7e7b9af104c ("fscrypt: stop using keyrings subsystem for
fscrypt_master_key") moved the keyring destruction from __put_super() to
generic_shutdown_super() so that the filesystem's block device(s) are
still available. Unfortunately, this causes a memory leak in the case
where a mount is attempted with the test_dummy_encryption mount option,
but the mount fails after the option has already been processed.
To fix this, attempt the keyring destruction in both places.
Reported-by: syzbot+104c2a89561289cec13e@syzkaller.appspotmail.com
Fixes: d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Link: https://lore.kernel.org/r/20221011213838.209879-1-ebiggers@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d7e7b9af104c7b389a0c21eb26532511bce4b510 upstream.
The approach of fs/crypto/ internally managing the fscrypt_master_key
structs as the payloads of "struct key" objects contained in a
"struct key" keyring has outlived its usefulness. The original idea was
to simplify the code by reusing code from the keyrings subsystem.
However, several issues have arisen that can't easily be resolved:
- When a master key struct is destroyed, blk_crypto_evict_key() must be
called on any per-mode keys embedded in it. (This started being the
case when inline encryption support was added.) Yet, the keyrings
subsystem can arbitrarily delay the destruction of keys, even past the
time the filesystem was unmounted. Therefore, currently there is no
easy way to call blk_crypto_evict_key() when a master key is
destroyed. Currently, this is worked around by holding an extra
reference to the filesystem's request_queue(s). But it was overlooked
that the request_queue reference is *not* guaranteed to pin the
corresponding blk_crypto_profile too; for device-mapper devices that
support inline crypto, it doesn't. This can cause a use-after-free.
- When the last inode that was using an incompletely-removed master key
is evicted, the master key removal is completed by removing the key
struct from the keyring. Currently this is done via key_invalidate().
Yet, key_invalidate() takes the key semaphore. This can deadlock when
called from the shrinker, since in fscrypt_ioctl_add_key(), memory is
allocated with GFP_KERNEL under the same semaphore.
- More generally, the fact that the keyrings subsystem can arbitrarily
delay the destruction of keys (via garbage collection delay, or via
random processes getting temporary key references) is undesirable, as
it means we can't strictly guarantee that all secrets are ever wiped.
- Doing the master key lookups via the keyrings subsystem results in the
key_permission LSM hook being called. fscrypt doesn't want this, as
all access control for encrypted files is designed to happen via the
files themselves, like any other files. The workaround which SELinux
users are using is to change their SELinux policy to grant key search
access to all domains. This works, but it is an odd extra step that
shouldn't really have to be done.
The fix for all these issues is to change the implementation to what I
should have done originally: don't use the keyrings subsystem to keep
track of the filesystem's fscrypt_master_key structs. Instead, just
store them in a regular kernel data structure, and rework the reference
counting, locking, and lifetime accordingly. Retain support for
RCU-mode key lookups by using a hash table. Replace fscrypt_sb_free()
with fscrypt_sb_delete(), which releases the keys synchronously and runs
a bit earlier during unmount, so that block devices are still available.
A side effect of this patch is that neither the master keys themselves
nor the filesystem keyrings will be listed in /proc/keys anymore.
("Master key users" and the master key users keyrings will still be
listed.) However, this was mostly an implementation detail, and it was
intended just for debugging purposes. I don't know of anyone using it.
This patch does *not* change how "master key users" (->mk_users) works;
that still uses the keyrings subsystem. That is still needed for key
quotas, and changing that isn't necessary to solve the issues listed
above. If we decide to change that too, it would be a separate patch.
I've marked this as fixing the original commit that added the fscrypt
keyring, but as noted above the most important issue that this patch
fixes wasn't introduced until the addition of inline encryption support.
Fixes: 22d94f493bfb ("fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220901193208.138056-2-ebiggers@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 43d2748394c3feb86c0c771466f5847e274fc043 ]
Change num_ghes from int to unsigned int, preventing an overflow
and causing subsequent vmalloc() to fail.
The overflow happens in ghes_estatus_pool_init() when calculating
len during execution of the statement below as both multiplication
operands here are signed int:
len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
The following call trace is observed because of this bug:
[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1
[ 9.317131] Call Trace:
[ 9.317134] <TASK>
[ 9.317137] dump_stack_lvl+0x49/0x5f
[ 9.317145] dump_stack+0x10/0x12
[ 9.317146] warn_alloc.cold+0x7b/0xdf
[ 9.317150] ? __device_attach+0x16a/0x1b0
[ 9.317155] __vmalloc_node_range+0x702/0x740
[ 9.317160] ? device_add+0x17f/0x920
[ 9.317164] ? dev_set_name+0x53/0x70
[ 9.317166] ? platform_device_add+0xf9/0x240
[ 9.317168] __vmalloc_node+0x49/0x50
[ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0
[ 9.317176] vmalloc+0x21/0x30
[ 9.317177] ghes_estatus_pool_init+0x43/0xa0
[ 9.317179] acpi_hest_init+0x129/0x19c
[ 9.317185] acpi_init+0x434/0x4a4
[ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a
[ 9.317190] do_one_initcall+0x48/0x200
[ 9.317195] kernel_init_freeable+0x221/0x284
[ 9.317200] ? rest_init+0xe0/0xe0
[ 9.317204] kernel_init+0x1a/0x130
[ 9.317205] ret_from_fork+0x22/0x30
[ 9.317208] </TASK>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 2ba3e38517f5a4ebf9c997168079dca01b7f9fc6 ]
The state argument for the functions for obtaining various parts of the
state is NULL if it is called by drivers for active state. Fail graciously
in that case instead of dereferencing a NULL pointer.
Suggested-by: Bingbu Cao <bingbu.cao@intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fac35ba763ed07ba93154c95ffc0c4a55023707f ]
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.
So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte(). However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.
That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.
For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.
Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().
To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.
Mike said:
Support for CONT_PMD/_PTE was added with bb9dd3df8ee9 ("arm64: hugetlb:
refactor find_num_contig()"). Patch series "Support for contiguous pte
hugepages", v4. However, I do not believe these code paths were
executed until migration support was added with 5480280d3f2d ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with 5480280d3f2d for the Fixes: targe.
Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com
Fixes: 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages")
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bacd22df95147ed673bec4692ab2d4d585935241 ]
mlx5_cmd_cleanup_async_ctx should return only after all its callback
handlers were completed. Before this patch, the below race between
mlx5_cmd_cleanup_async_ctx and mlx5_cmd_exec_cb_handler was possible and
lead to a use-after-free:
1. mlx5_cmd_cleanup_async_ctx is called while num_inflight is 2 (i.e.
elevated by 1, a single inflight callback).
2. mlx5_cmd_cleanup_async_ctx decreases num_inflight to 1.
3. mlx5_cmd_exec_cb_handler is called, decreases num_inflight to 0 and
is about to call wake_up().
4. mlx5_cmd_cleanup_async_ctx calls wait_event, which returns
immediately as the condition (num_inflight == 0) holds.
5. mlx5_cmd_cleanup_async_ctx returns.
6. The caller of mlx5_cmd_cleanup_async_ctx frees the mlx5_async_ctx
object.
7. mlx5_cmd_exec_cb_handler goes on and calls wake_up() on the freed
object.
Fix it by syncing using a completion object. Mark it completed when
num_inflight reaches 0.
Trace:
BUG: KASAN: use-after-free in do_raw_spin_lock+0x23d/0x270
Read of size 4 at addr ffff888139cd12f4 by task swapper/5/0
CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0x57/0x7d
print_report.cold+0x2d5/0x684
? do_raw_spin_lock+0x23d/0x270
kasan_report+0xb1/0x1a0
? do_raw_spin_lock+0x23d/0x270
do_raw_spin_lock+0x23d/0x270
? rwlock_bug.part.0+0x90/0x90
? __delete_object+0xb8/0x100
? lock_downgrade+0x6e0/0x6e0
_raw_spin_lock_irqsave+0x43/0x60
? __wake_up_common_lock+0xb9/0x140
__wake_up_common_lock+0xb9/0x140
? __wake_up_common+0x650/0x650
? destroy_tis_callback+0x53/0x70 [mlx5_core]
? kasan_set_track+0x21/0x30
? destroy_tis_callback+0x53/0x70 [mlx5_core]
? kfree+0x1ba/0x520
? do_raw_spin_unlock+0x54/0x220
mlx5_cmd_exec_cb_handler+0x136/0x1a0 [mlx5_core]
? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core]
? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core]
mlx5_cmd_comp_handler+0x65a/0x12b0 [mlx5_core]
? dump_command+0xcc0/0xcc0 [mlx5_core]
? lockdep_hardirqs_on_prepare+0x400/0x400
? cmd_comp_notifier+0x7e/0xb0 [mlx5_core]
cmd_comp_notifier+0x7e/0xb0 [mlx5_core]
atomic_notifier_call_chain+0xd7/0x1d0
mlx5_eq_async_int+0x3ce/0xa20 [mlx5_core]
atomic_notifier_call_chain+0xd7/0x1d0
? irq_release+0x140/0x140 [mlx5_core]
irq_int_handler+0x19/0x30 [mlx5_core]
__handle_irq_event_percpu+0x1f2/0x620
handle_irq_event+0xb2/0x1d0
handle_edge_irq+0x21e/0xb00
__common_interrupt+0x79/0x1a0
common_interrupt+0x78/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
RIP: 0010:default_idle+0x42/0x60
Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 eb 47 22 02 85 c0 7e 07 0f 00 2d e0 9f 48 00 fb f4 <c3> 48 c7 c7 80 08 7f 85 e8 d1 d3 3e fe eb de 66 66 2e 0f 1f 84 00
RSP: 0018:ffff888100dbfdf0 EFLAGS: 00000242
RAX: 0000000000000001 RBX: ffffffff84ecbd48 RCX: 1ffffffff0afe110
RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835cc9bc
RBP: 0000000000000005 R08: 0000000000000001 R09: ffff88881dec4ac3
R10: ffffed1103bd8958 R11: 0000017d0ca571c9 R12: 0000000000000005
R13: ffffffff84f024e0 R14: 0000000000000000 R15: dffffc0000000000
? default_idle_call+0xcc/0x450
default_idle_call+0xec/0x450
do_idle+0x394/0x450
? arch_cpu_idle_exit+0x40/0x40
? do_idle+0x17/0x450
cpu_startup_entry+0x19/0x20
start_secondary+0x221/0x2b0
? set_cpu_sibling_map+0x2070/0x2070
secondary_startup_64_no_verify+0xcd/0xdb
</TASK>
Allocated by task 49502:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x81/0xa0
kvmalloc_node+0x48/0xe0
mlx5e_bulk_async_init+0x35/0x110 [mlx5_core]
mlx5e_tls_priv_tx_list_cleanup+0x84/0x3e0 [mlx5_core]
mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core]
mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core]
mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
mlx5e_suspend+0xdb/0x140 [mlx5_core]
mlx5e_remove+0x89/0x190 [mlx5_core]
auxiliary_bus_remove+0x52/0x70
device_release_driver_internal+0x40f/0x650
driver_detach+0xc1/0x180
bus_remove_driver+0x125/0x2f0
auxiliary_driver_unregister+0x16/0x50
mlx5e_cleanup+0x26/0x30 [mlx5_core]
cleanup+0xc/0x4e [mlx5_core]
__x64_sys_delete_module+0x2b5/0x450
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Freed by task 49502:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_set_free_info+0x20/0x30
____kasan_slab_free+0x11d/0x1b0
kfree+0x1ba/0x520
mlx5e_tls_priv_tx_list_cleanup+0x2e7/0x3e0 [mlx5_core]
mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core]
mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core]
mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
mlx5e_suspend+0xdb/0x140 [mlx5_core]
mlx5e_remove+0x89/0x190 [mlx5_core]
auxiliary_bus_remove+0x52/0x70
device_release_driver_internal+0x40f/0x650
driver_detach+0xc1/0x180
bus_remove_driver+0x125/0x2f0
auxiliary_driver_unregister+0x16/0x50
mlx5e_cleanup+0x26/0x30 [mlx5_core]
cleanup+0xc/0x4e [mlx5_core]
__x64_sys_delete_module+0x2b5/0x450
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: e355477ed9e4 ("net/mlx5: Make mlx5_cmd_exec_cb() a safe API")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-8-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8da7f0976b9071b528c545008de9d10cc81883b1 ]
If it is a progressive (non-interlaced) format, then ignore the
interlaced timing values.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 7f68127fa11f ([media] videodev2.h: defines to calculate blanking and frame sizes)
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 720ca52bcef225b967a339e0fffb6d0c7e962240 ]
As Shakeel explains the commit under Fixes had the unintended
side-effect of no longer pre-loading the cached memory allowance.
Even tho we previously dropped the first packet received when
over memory limit - the consecutive ones would get thru by using
the cache. The charging was happening in batches of 128kB, so
we'd let in 128kB (truesize) worth of packets per one drop.
After the change we no longer force charge, there will be no
cache filling side effects. This causes significant drops and
connection stalls for workloads which use a lot of page cache,
since we can't reclaim page cache under GFP_NOWAIT.
Some of the latency can be recovered by improving SACK reneg
handling but nowhere near enough to get back to the pre-5.15
performance (the application I'm experimenting with still
sees 5-10x worst latency).
Apply the suggested workaround of using GFP_ATOMIC. We will now
be more permissive than previously as we'll drop _no_ packets
in softirq when under pressure. But I can't think of any good
and simple way to address that within networking.
Link: https://lore.kernel.org/all/20221012163300.795e7b86@kernel.org/
Suggested-by: Shakeel Butt <shakeelb@google.com>
Fixes: 4b1327be9fe5 ("net-memcg: pass in gfp_t mask to mem_cgroup_charge_skmem()")
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://lore.kernel.org/r/20221021160304.1362511-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit ca6c21327c6af02b7eec31ce4b9a740a18c6c13f ]
Marco reported:
Due to the implementation of how SIGTRAP are delivered if
perf_event_attr::sigtrap is set, we've noticed 3 issues:
1. Missing SIGTRAP due to a race with event_sched_out() (more
details below).
2. Hardware PMU events being disabled due to returning 1 from
perf_event_overflow(). The only way to re-enable the event is
for user space to first "properly" disable the event and then
re-enable it.
3. The inability to automatically disable an event after a
specified number of overflows via PERF_EVENT_IOC_REFRESH.
The worst of the 3 issues is problem (1), which occurs when a
pending_disable is "consumed" by a racing event_sched_out(), observed
as follows:
CPU0 | CPU1
--------------------------------+---------------------------
__perf_event_overflow() |
perf_event_disable_inatomic() |
pending_disable = CPU0 | ...
| _perf_event_enable()
| event_function_call()
| task_function_call()
| /* sends IPI to CPU0 */
<IPI> | ...
__perf_event_enable() +---------------------------
ctx_resched()
task_ctx_sched_out()
ctx_sched_out()
group_sched_out()
event_sched_out()
pending_disable = -1
</IPI>
<IRQ-work>
perf_pending_event()
perf_pending_event_disable()
/* Fails to send SIGTRAP because no pending_disable! */
</IRQ-work>
In the above case, not only is that particular SIGTRAP missed, but also
all future SIGTRAPs because 'event_limit' is not reset back to 1.
To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction
of a separate 'pending_sigtrap', no longer using 'event_limit' and
'pending_disable' for its delivery.
Additionally; and different to Marco's proposed patch:
- recognise that pending_disable effectively duplicates oncpu for
the case where it is set. As such, change the irq_work handler to
use ->oncpu to target the event and use pending_* as boolean toggles.
- observe that SIGTRAP targets the ctx->task, so the context switch
optimization that carries contexts between tasks is invalid. If
the irq_work were delayed enough to hit after a context switch the
SIGTRAP would be delivered to the wrong task.
- observe that if the event gets scheduled out
(rotation/migration/context-switch/...) the irq-work would be
insufficient to deliver the SIGTRAP when the event gets scheduled
back in (the irq-work might still be pending on the old CPU).
Therefore have event_sched_out() convert the pending sigtrap into a
task_work which will deliver the signal at return_to_user.
Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Debugged-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Marco Elver <elver@google.com>
Debugged-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Marco Elver <elver@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit bb9ea2c31fa11b789ade4c3abcdda3c5370a76ab ]
The doc says the I²C device's name is used if devname is NULL, but
actually the I²C device driver's name is used.
Fixes: 0658293012af ("media: v4l: subdev: Add a function to set an I²C sub-device's name")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit c3ed222745d9ad7b69299b349a64ba533c64a34f upstream.
Send along the already-allocated fattr along with nfs4_fs_locations, and
drop the memcpy of fattr. We end up growing two more allocations, but this
fixes up a crash as:
PID: 790 TASK: ffff88811b43c000 CPU: 0 COMMAND: "ls"
#0 [ffffc90000857920] panic at ffffffff81b9bfde
#1 [ffffc900008579c0] do_trap at ffffffff81023a9b
#2 [ffffc90000857a10] do_error_trap at ffffffff81023b78
#3 [ffffc90000857a58] exc_stack_segment at ffffffff81be1f45
#4 [ffffc90000857a80] asm_exc_stack_segment at ffffffff81c009de
#5 [ffffc90000857b08] nfs_lookup at ffffffffa0302322 [nfs]
#6 [ffffc90000857b70] __lookup_slow at ffffffff813a4a5f
#7 [ffffc90000857c60] walk_component at ffffffff813a86c4
#8 [ffffc90000857cb8] path_lookupat at ffffffff813a9553
#9 [ffffc90000857cf0] filename_lookup at ffffffff813ab86b
Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Fixes: 9558a007dbc3 ("NFS: Remove the label from the nfs4_lookup_res struct")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 07d2872bf4c864eb83d034263c155746a2fb7a3b upstream.
Some SD-cards from Sandisk that are SDA-6.0 compliant reports they supports
discard, while they actually don't. This might cause mk2fs to fail while
trying to format the card and revert it to a read-only mode.
To fix this problem, let's add a card quirk (MMC_QUIRK_BROKEN_SD_DISCARD)
to indicate that we shall fall-back to use the legacy erase command
instead.
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220928095744.16455-1-avri.altman@wdc.com
[Ulf: Updated the commit message]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c19d893fbf3f2f8fa864ae39652c7fee939edde2 ]
qdisc_reset() is clearing qdisc->q.qlen and qdisc->qstats.backlog
_after_ calling qdisc->ops->reset. There is no need to clear them
again in the specific reset function.
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220824005231.345727-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 2a3fc78210b9 ("net: sched: sfb: fix null pointer access issue when sfb_init() fails")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 69421bf98482d089e50799f45e48b25ce4a8d154 ]
When we call connect() for a UDP socket in a reuseport group, we have
to update sk->sk_reuseport_cb->has_conns to 1. Otherwise, the kernel
could select a unconnected socket wrongly for packets sent to the
connected socket.
However, the current way to set has_conns is illegal and possible to
trigger that problem. reuseport_has_conns() changes has_conns under
rcu_read_lock(), which upgrades the RCU reader to the updater. Then,
it must do the update under the updater's lock, reuseport_lock, but
it doesn't for now.
For this reason, there is a race below where we fail to set has_conns
resulting in the wrong socket selection. To avoid the race, let's split
the reader and updater with proper locking.
cpu1 cpu2
+----+ +----+
__ip[46]_datagram_connect() reuseport_grow()
. .
|- reuseport_has_conns(sk, true) |- more_reuse = __reuseport_alloc(more_socks_size)
| . |
| |- rcu_read_lock()
| |- reuse = rcu_dereference(sk->sk_reuseport_cb)
| |
| | | /* reuse->has_conns == 0 here */
| | |- more_reuse->has_conns = reuse->has_conns
| |- reuse->has_conns = 1 | /* more_reuse->has_conns SHOULD BE 1 HERE */
| | |
| | |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb,
| | | more_reuse)
| `- rcu_read_unlock() `- kfree_rcu(reuse, rcu)
|
|- sk->sk_state = TCP_ESTABLISHED
Note the likely(reuse) in reuseport_has_conns_set() is always true,
but we put the test there for ease of review. [0]
For the record, usually, sk_reuseport_cb is changed under lock_sock().
The only exception is reuseport_grow() & TCP reqsk migration case.
1) shutdown() TCP listener, which is moved into the latter part of
reuse->socks[] to migrate reqsk.
2) New listen() overflows reuse->socks[] and call reuseport_grow().
3) reuse->max_socks overflows u16 with the new listener.
4) reuseport_grow() pops the old shutdown()ed listener from the array
and update its sk->sk_reuseport_cb as NULL without lock_sock().
shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(),
but, reuseport_has_conns_set() is called only for UDP under lock_sock(),
so likely(reuse) never be false in reuseport_has_conns_set().
[0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/
Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 96de900ae78e7dbedc937fd91bafe2934579c65a ]
The recent commit
'commit 744d23c71af3 ("net: phy: Warn about incorrect
mdio_bus_phy_resume() state")'
requires the MAC driver explicitly tell the phy driver who is
managing the PM, otherwise you will see warning during resume
stage.
Add a boolean property in the phylink_config structure so that
the MAC driver can use it to tell the PHY driver if it wants to
manage the PM.
Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit ed51862f2f57cbce6fed2d4278cfe70a490899fd upstream.
We will introduce the first architecture specific compat vm ioctl in the
next patch. Add all necessary boilerplate to allow architectures to
override compat vm ioctls when necessary.
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221017184541.2658-2-graf@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 30393181fdbc1608cc683b4ee99dcce05ffcc8c7 upstream.
This patch adds handling to return -EINVAL for an unknown addr type. The
current behaviour is to return 0 as successful but the size of an
unknown addr type is not defined and should return an error like -EINVAL.
Fixes: 94160108a70c ("net/ieee802154: fix uninit value bug in dgram_sendmsg")
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ upstream commit 0091bfc81741b8d3aeb3b7ab8636f911b2de6e80 ]
Instead of putting io_uring's registered files in unix_gc() we want it
to be done by io_uring itself. The trick here is to consider io_uring
registered files for cycle detection but not actually putting them down.
Because io_uring can't register other ring instances, this will remove
all refs to the ring file triggering the ->release path and clean up
with io_ring_ctx_free().
Cc: stable@vger.kernel.org
Fixes: 6b06314c47e1 ("io_uring: add file set registration")
Reported-and-tested-by: David Bouman <dbouman03@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
[axboe: add kerneldoc comment to skb, fold in skb leak fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1a77dd1c2bb5d4a58c16d198cf593720787c02e4 ]
Fix this compilation error seen when CONFIG_TRACING is not enabled:
drivers/scsi/qla2xxx/qla_os.c: In function 'qla_trace_init':
drivers/scsi/qla2xxx/qla_os.c:2854:25: error: implicit declaration of function
'trace_array_get_by_name'; did you mean 'trace_array_set_clr_event'?
[-Werror=implicit-function-declaration]
2854 | qla_trc_array = trace_array_get_by_name("qla2xxx");
| ^~~~~~~~~~~~~~~~~~~~~~~
| trace_array_set_clr_event
drivers/scsi/qla2xxx/qla_os.c: In function 'qla_trace_uninit':
drivers/scsi/qla2xxx/qla_os.c:2869:9: error: implicit declaration of function
'trace_array_put' [-Werror=implicit-function-declaration]
2869 | trace_array_put(qla_trc_array);
| ^~~~~~~~~~~~~~~
Link: https://lore.kernel.org/r/20220907233308.4153-2-aeasi@marvell.com
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9f0deaa12d832f488500a5afe9b912e9b3cfc432 ]
Guard wakeups that the user can trigger, and that may end up triggering a
call back into eventfd_signal. This is in addition to the current approach
that only guards in eventfd_signal.
Rename in_eventfd_signal -> in_eventfd at the same time to reflect this.
Without this there would be a deadlock in the following code using libaio:
int main()
{
struct io_context *ctx = NULL;
struct iocb iocb;
struct iocb *iocbs[] = { &iocb };
int evfd;
uint64_t val = 1;
evfd = eventfd(0, EFD_CLOEXEC);
assert(!io_setup(2, &ctx));
io_prep_poll(&iocb, evfd, POLLIN);
io_set_eventfd(&iocb, evfd);
assert(1 == io_submit(ctx, 1, iocbs));
write(evfd, &val, 8);
}
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220816135959.1490641-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4f58330fcc8482aa90674e1f40f601e82f18ed4a ]
IOMMU_IOVA is intended to be an optional library for users to select as
and when they desire. Since it can be a module now, this means that
built-in code which has chosen not to select it should not fail to link
if it happens to have selected as a module by someone else. Replace
IS_ENABLED() with IS_REACHABLE() to do the right thing.
CC: Thierry Reding <thierry.reding@gmail.com>
Reported-by: John Garry <john.garry@huawei.com>
Fixes: 15bbdec3931e ("iommu: Make the iova library a module")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
Link: https://lore.kernel.org/r/548c2f683ca379aface59639a8f0cccc3a1ac050.1663069227.git.robin.murphy@arm.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 8af809966c0b34cfacd8da9a412689b8e9910354 ]
Add helpers to allow the drivers to run their recv paths from libiscsi's
workqueue.
Link: https://lore.kernel.org/r/20220616224557.115234-3-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stable-dep-of: 57569c37f0ad ("scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 4b9f8ce4d5e823e42944c5a0a4842b0f936365ad ]
Rename iscsi_conn_queue_work() to iscsi_conn_queue_xmit() to reflect that
it handles queueing of xmits only.
Link: https://lore.kernel.org/r/20220616224557.115234-2-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Wu Bo <wubo40@huawei.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stable-dep-of: 57569c37f0ad ("scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 039d4926379b1d1c17b51cf21c500a5eed86899e ]
Invoking TIOCVHANGUP on 8250_mid port on Ice Lake-D and then reopening
the port triggers these faults during serial8250_do_startup():
DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write NO_PASID] Request device [00:1a.0] fault addr 0x0 [fault reason 0x05] PTE Write access is not set
If the IRQ hasn't been set up yet, the UART will have zeroes in its MSI
address/data registers. Disabling the IRQ at the interrupt controller
won't stop the UART from performing a DMA write to the address programmed
in its MSI address register (zero) when it wants to signal an interrupt.
The UARTs (in Ice Lake-D) implement PCI 2.1 style MSI without masking
capability, so there is no way to mask the interrupt at the source PCI
function level, except disabling the MSI capability entirely, but that
would cause it to fall back to INTx# assertion, and the PCI specification
prohibits disabling the MSI capability as a way to mask a function's
interrupt service request.
The MSI address register is zeroed by the hangup as the irq is freed.
The interrupt is signalled during serial8250_do_startup() performing a
THRE test that temporarily toggles THRI in IER. The THRE test currently
occurs before UART's irq (and MSI address) is properly set up.
Refactor serial8250_do_startup() such that irq is set up before the
THRE test. The current irq setup code is intermixed with the timer
setup code. As THRE test must be performed prior to the timer setup,
extract it into own function and call it only after the THRE test.
The ->setup_timer() needs to be part of the struct uart_8250_ops in
order to not create circular dependency between 8250 and 8250_base
modules.
Fixes: 40b36daad0ac ("[PATCH] 8250 UART backup timer")
Reported-by: Lennert Buytenhek <buytenh@arista.com>
Tested-by: Lennert Buytenhek <buytenh@arista.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20220922070005.2965-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 630624cb1b5826d753ac8e01a0e42de43d66dedf ]
ACS-5 section
7.13.6.36 Word 78: Serial ATA features supported
states that:
If word 76 is not 0000h or FFFFh, word 78 reports the features supported
by the device. If this word is not supported, the word shall be cleared
to zero.
(This text also exists in really old ACS standards, e.g. ACS-3.)
The problem with ata_id_has_dipm() is that the while it performs a
check against 0 and 0xffff, it performs the check against
ATA_ID_FEATURE_SUPP (word 78), the same word where the feature bit
is stored.
Fix this by performing the check against ATA_ID_SATA_CAPABILITY
(word 76), like required by the spec. The feature bit check itself
is of course still performed against ATA_ID_FEATURE_SUPP (word 78).
Additionally, move the macro to the other ATA_ID_FEATURE_SUPP macros
(which already have this check), thus making it more likely that the
next ATA_ID_FEATURE_SUPP macro that is added will include this check.
Fixes: ca77329fb713 ("[libata] Link power management infrastructure")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a5fb6bf853148974dbde092ec1bde553bea5e49f ]
ACS-5 section
7.13.6.36 Word 78: Serial ATA features supported
states that:
If word 76 is not 0000h or FFFFh, word 78 reports the features supported
by the device. If this word is not supported, the word shall be cleared
to zero.
(This text also exists in really old ACS standards, e.g. ACS-3.)
Additionally, move the macro to the other ATA_ID_FEATURE_SUPP macros
(which already have this check), thus making it more likely that the
next ATA_ID_FEATURE_SUPP macro that is added will include this check.
Fixes: 5b01e4b9efa0 ("libata: Implement NCQ autosense")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9c6e09a434e1317e09b78b3b69cd384022ec9a03 ]
ACS-5 section
7.13.6.36 Word 78: Serial ATA features supported
states that:
If word 76 is not 0000h or FFFFh, word 78 reports the features supported
by the device. If this word is not supported, the word shall be cleared
to zero.
(This text also exists in really old ACS standards, e.g. ACS-3.)
Additionally, move the macro to the other ATA_ID_FEATURE_SUPP macros
(which already have this check), thus making it more likely that the
next ATA_ID_FEATURE_SUPP macro that is added will include this check.
Fixes: 65fe1f0f66a5 ("ahci: implement aggressive SATA device sleep support")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 690aa8c3ae308bc696ec8b1b357b995193927083 ]
ACS-5 section
7.13.6.41 Words 85..87, 120: Commands and feature sets supported or enabled
states that:
If bit 15 of word 86 is set to one, bit 14 of word 119 is set to one,
and bit 15 of word 119 is cleared to zero, then word 119 is valid.
If bit 15 of word 86 is set to one, bit 14 of word 120 is set to one,
and bit 15 of word 120 is cleared to zero, then word 120 is valid.
(This text also exists in really old ACS standards, e.g. ACS-3.)
Currently, ata_id_sense_reporting_enabled() and
ata_id_has_sense_reporting() both check bit 15 of word 86,
but neither of them check that bit 14 of word 119 is set to one,
or that bit 15 of word 119 is cleared to zero.
Additionally, make ata_id_sense_reporting_enabled() return false
if !ata_id_has_sense_reporting(), similar to how e.g.
ata_id_flush_ext_enabled() returns false if !ata_id_has_flush_ext().
Fixes: e87fd28cf9a2 ("libata: Implement support for sense data reporting")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e26ef3af964acfea311403126acee8c56c89e26b ]
This exported fn is unused, and will not be needed. Lets dump it.
The export was added to let drm control pr_debugs, as part of using
them to avoid drm_debug_enabled overheads. But its better to just
implement the drm.debug bitmap interface, then its available for
everyone.
Fixes: a2d375eda771 ("dyndbg: refine export, rename to dynamic_debug_exec_queries()")
Fixes: 4c0d77828d4f ("dyndbg: export ddebug_exec_queries")
Acked-by: Jason Baron <jbaron@akamai.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20220904214134.408619-10-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 85d6b66d31c35158364058ee98fb69ab5bb6a6b1 ]
For CONFIG_DYNAMIC_DEBUG=N, the ddebug_dyndbg_module_param_cb()
stub-fn is too permissive:
bash-5.1# modprobe drm JUNKdyndbg
bash-5.1# modprobe drm dyndbgJUNK
[ 42.933220] dyndbg param is supported only in CONFIG_DYNAMIC_DEBUG builds
[ 42.937484] ACPI: bus type drm_connector registered
This caused no ill effects, because unknown parameters are either
ignored by default with an "unknown parameter" warning, or ignored
because dyndbg allows its no-effect use on non-dyndbg builds.
But since the code has an explicit feedback message, it should be
issued accurately. Fix with strcmp for exact param-name match.
Fixes: b48420c1d301 dynamic_debug: make dynamic-debug work for module initialization
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Jason Baron <jbaron@akamai.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Link: https://lore.kernel.org/r/20220904214134.408619-3-jim.cromie@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 13ad1125b941a5f257d9d3ae70485773abd34792 ]
According to the ib spec:
If the CI supports the Base Memory Management Extensions defined in this
specification, the L_Key format must consist of:
24 bit index in the most significant bits of the R_Key, and
8 bit key in the least significant bits of the R_Key
Through a successful Allocate L_Key verb invocation, the CI must let the
consumer own the key portion of the returned R_Key
Therefore, when creating a mkey using DEVX, the consumer is allowed to
change the key part. The kernel should compare only the index part of a
R_Key to determine equality with another R_Key.
Adding capability in order not to break backward compatibility.
Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP")
Link: https://lore.kernel.org/r/3d669aacea85a3a15c3b3b953b3eaba3f80ef9be.1659255945.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 62c07983bef9d3e78e71189441e1a470f0d1e653 ]
Christophe Leroy reported a ~80ms latency spike
happening at first TCP connect() time.
This is because __inet_hash_connect() uses get_random_once()
to populate a perturbation table which became quite big
after commit 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16")
get_random_once() uses DO_ONCE(), which block hard irqs for the duration
of the operation.
This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock
for operations where we prefer to stay in process context.
Then __inet_hash_connect() can use get_random_slow_once()
to populate its perturbation table.
Fixes: 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16")
Fixes: 190cc82489f4 ("tcp: change source port randomizarion at connect() time")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#t
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|