summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2023-06-09firmware: qcom_scm: Use fixed width src vm bitmapElliot Berman
[ Upstream commit 968a26a07f75377afbd4f7bb18ef587a1443c244 ] The maximum VMID for assign_mem is 63. Use a u64 to represent this bitmap instead of architecture-dependent "unsigned int" which varies in size on 32-bit and 64-bit platforms. Acked-by: Kalle Valo <kvalo@kernel.org> (ath10k) Tested-by: Gokul krishna Krishnakumar <quic_gokukris@quicinc.com> Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230213181832.3489174-1-quic_eberman@quicinc.com Stable-dep-of: a6e766dea0a2 ("misc: fastrpc: Pass proper scm arguments for secure map request") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09scsi: ufs: core: Rename symbol sizeof_utp_transfer_cmd_desc()Po-Wen Kao
[ Upstream commit 06caeb536b2b21668efd2d6fa97c09461957b3a7 ] Naming the functions after standard operators like sizeof() may cause confusion. Rename it to ufshcd_get_ucd_size(). Signed-off-by: Po-Wen Kao <powen.kao@mediatek.com> Link: https://lore.kernel.org/r/20230504154454.26654-3-powen.kao@mediatek.com Suggested-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Ziqi Chen <quic_ziqichen@quicinc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09media: dvb-core: Fix use-after-free due to race at dvb_register_device()Hyunwoo Kim
[ Upstream commit 627bb528b086b4136315c25d6a447a98ea9448d3 ] dvb_register_device() dynamically allocates fops with kmemdup() to set the fops->owner. And these fops are registered in 'file->f_ops' using replace_fops() in the dvb_device_open() process, and kfree()d in dvb_free_device(). However, it is not common to use dynamically allocated fops instead of 'static const' fops as an argument of replace_fops(), and UAF may occur. These UAFs can occur on any dvb type using dvb_register_device(), such as dvb_dvr, dvb_demux, dvb_frontend, dvb_net, etc. So, instead of kfree() the fops dynamically allocated in dvb_register_device() in dvb_free_device() called during the .disconnect() process, kfree() it collectively in exit_dvbdev() called when the dvbdev.c module is removed. Link: https://lore.kernel.org/linux-media/20221117045925.14297-4-imv4bel@gmail.com Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09media: dvb-core: Fix use-after-free due on race condition at dvb_netHyunwoo Kim
[ Upstream commit 4172385b0c9ac366dcab78eda48c26814b87ed1a ] A race condition may occur between the .disconnect function, which is called when the device is disconnected, and the dvb_device_open() function, which is called when the device node is open()ed. This results in several types of UAFs. The root cause of this is that you use the dvb_device_open() function, which does not implement a conditional statement that checks 'dvbnet->exit'. So, add 'remove_mutex` to protect 'dvbnet->exit' and use locked_dvb_net_open() function to check 'dvbnet->exit'. [mchehab: fix a checkpatch warning] Link: https://lore.kernel.org/linux-media/20221117045925.14297-3-imv4bel@gmail.com Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09media: dvb-core: Fix use-after-free on race condition at dvb_frontendHyunwoo Kim
[ Upstream commit 6769a0b7ee0c3b31e1b22c3fadff2bfb642de23f ] If the device node of dvb_frontend is open() and the device is disconnected, many kinds of UAFs may occur when calling close() on the device node. The root cause of this is that wake_up() for dvbdev->wait_queue is implemented in the dvb_frontend_release() function, but wait_event() is not implemented in the dvb_frontend_stop() function. So, implement wait_event() function in dvb_frontend_stop() and add 'remove_mutex' which prevents race condition for 'fe->exit'. [mchehab: fix a couple of checkpatch warnings and some mistakes at the error handling logic] Link: https://lore.kernel.org/linux-media/20221117045925.14297-2-imv4bel@gmail.com Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09tcp: fix mishandling when the sack compression is deferred.fuyuanli
[ Upstream commit 30c6f0bf9579debce27e45fac34fdc97e46acacc ] In this patch, we mainly try to handle sending a compressed ack correctly if it's deferred. Here are more details in the old logic: When sack compression is triggered in the tcp_compressed_ack_kick(), if the sock is owned by user, it will set TCP_DELACK_TIMER_DEFERRED and then defer to the release cb phrase. Later once user releases the sock, tcp_delack_timer_handler() should send a ack as expected, which, however, cannot happen due to lack of ICSK_ACK_TIMER flag. Therefore, the receiver would not sent an ack until the sender's retransmission timeout. It definitely increases unnecessary latency. Fixes: 5d9f4262b7ea ("tcp: add SACK compression") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: fuyuanli <fuyuanli@didiglobal.com> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/netdev/20230529113804.GA20300@didi-ThinkCentre-M920t-N000/ Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230531080150.GA20424@didi-ThinkCentre-M920t-N000 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09nfsd: fix double fget() bug in __write_ports_addfd()Dan Carpenter
[ Upstream commit c034203b6a9dae6751ef4371c18cb77983e30c28 ] The bug here is that you cannot rely on getting the same socket from multiple calls to fget() because userspace can influence that. This is a kind of double fetch bug. The fix is to delete the svc_alien_sock() function and instead do the checking inside the svc_addsock() function. Fixes: 3064639423c4 ("nfsd: check passed socket's net matches NFSd superblock's one") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: NeilBrown <neilb@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09tcp: deny tcp_disconnect() when threads are waitingEric Dumazet
[ Upstream commit 4faeee0cf8a5d88d63cdbc3bab124fb0e6aed08c ] Historically connect(AF_UNSPEC) has been abused by syzkaller and other fuzzers to trigger various bugs. A recent one triggers a divide-by-zero [1], and Paolo Abeni was able to diagnose the issue. tcp_recvmsg_locked() has tests about sk_state being not TCP_LISTEN and TCP REPAIR mode being not used. Then later if socket lock is released in sk_wait_data(), another thread can call connect(AF_UNSPEC), then make this socket a TCP listener. When recvmsg() is resumed, it can eventually call tcp_cleanup_rbuf() and attempt a divide by 0 in tcp_rcv_space_adjust() [1] This patch adds a new socket field, counting number of threads blocked in sk_wait_event() and inet_wait_for_connect(). If this counter is not zero, tcp_disconnect() returns an error. This patch adds code in blocking socket system calls, thus should not hurt performance of non blocking ones. Note that we probably could revert commit 499350a5a6e7 ("tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0") to restore original tcpi_rcv_mss meaning (was 0 if no payload was ever received on a socket) [1] divide error: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 13832 Comm: syz-executor.5 Not tainted 6.3.0-rc4-syzkaller-00224-g00c7b5f4ddc5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 RIP: 0010:tcp_rcv_space_adjust+0x36e/0x9d0 net/ipv4/tcp_input.c:740 Code: 00 00 00 00 fc ff df 4c 89 64 24 48 8b 44 24 04 44 89 f9 41 81 c7 80 03 00 00 c1 e1 04 44 29 f0 48 63 c9 48 01 e9 48 0f af c1 <49> f7 f6 48 8d 04 41 48 89 44 24 40 48 8b 44 24 30 48 c1 e8 03 48 RSP: 0018:ffffc900033af660 EFLAGS: 00010206 RAX: 4a66b76cbade2c48 RBX: ffff888076640cc0 RCX: 00000000c334e4ac RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000001 RBP: 00000000c324e86c R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880766417f8 R13: ffff888028fbb980 R14: 0000000000000000 R15: 0000000000010344 FS: 00007f5bffbfe700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32f25000 CR3: 000000007ced0000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tcp_recvmsg_locked+0x100e/0x22e0 net/ipv4/tcp.c:2616 tcp_recvmsg+0x117/0x620 net/ipv4/tcp.c:2681 inet6_recvmsg+0x114/0x640 net/ipv6/af_inet6.c:670 sock_recvmsg_nosec net/socket.c:1017 [inline] sock_recvmsg+0xe2/0x160 net/socket.c:1038 ____sys_recvmsg+0x210/0x5a0 net/socket.c:2720 ___sys_recvmsg+0xf2/0x180 net/socket.c:2762 do_recvmmsg+0x25e/0x6e0 net/socket.c:2856 __sys_recvmmsg net/socket.c:2935 [inline] __do_sys_recvmmsg net/socket.c:2958 [inline] __se_sys_recvmmsg net/socket.c:2951 [inline] __x64_sys_recvmmsg+0x20f/0x260 net/socket.c:2951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f5c0108c0f9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f5bffbfe168 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 00007f5c011ac050 RCX: 00007f5c0108c0f9 RDX: 0000000000000001 RSI: 0000000020000bc0 RDI: 0000000000000003 RBP: 00007f5c010e7b39 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000122 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f5c012cfb1f R14: 00007f5bffbfe300 R15: 0000000000022000 </TASK> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Paolo Abeni <pabeni@redhat.com> Diagnosed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20230526163458.2880232-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09efi: Bump stub image version for macOS HVF compatibilityAkihiro Suda
[ Upstream commit 36e4fc57fc1619f462e669e939209c45763bc8f5 ] The macOS hypervisor framework includes a host-side VMM called VZLinuxBootLoader [1] which implements native support for booting the Linux kernel inside a guest directly (instead of, e.g., via GRUB installed inside the guest). On x86, it incorporates a BIOS style loader that does not implement or expose EFI to the loaded kernel. However, this loader appears to fail when the 'image minor version' field in the kernel image's PE/COFF header (which is generally only used by EFI based bootloaders) is set to any value other than 0x0. [2] Commit e346bebbd36b1576 ("efi: libstub: Always enable initrd command line loader and bump version") incremented the EFI stub image minor version to convey that all EFI stub kernels now implement support for the initrd= command line option, and do so in a way where it can load initrd images from any filesystem known to the EFI firmware (as opposed to prior implementations that could only load initrds from the same volume that the kernel image was loaded from). Unfortunately, bumping the version to v1.1 triggers this issue in VZLinuxBootLoader, breaking the boot on x86. So let's keep the image minor version at 0x0, and bump the image major version instead. While at it, convert this field to a bit field, so that individual features are discoverable from it, as suggested by Linus. So let's bump the major version to v3, and document the initrd= command line loading feature as being represented by bit 1 in the mask. Note that, due to the prior interpretation as a monotonically increasing version field, loaders are still permitted to assume that the LoadFile2 initrd loading feature is supported for any major version value >= 1, even if bit 0 is not set. [1] https://developer.apple.com/documentation/virtualization/vzlinuxbootloader [2] https://lore.kernel.org/linux-efi/CAG8fp8Teu4G9JuenQrqGndFt2Gy+V4YgJ=hN1xX7AD940YKf3A@mail.gmail.com/ Fixes: e346bebbd36b1576 ("efi: libstub: Always enable initrd command ...") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217485 Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com> [ardb: rewrite comment and commit log] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-09net/mlx5e: Use query_special_contexts cmd only once per mdevDragos Tatulea
[ Upstream commit 1db1f21caebbb1b6e9b1e7657df613616be3fb49 ] Don't query the firmware so many times (num rqs * num wqes * wqe frags) because it slows down linearly the interface creation time when the product is larger. Do it only once per mdev and store the result in mlx5e_param. Due to helper function being called from different files, move it to an appropriate location. Rename the function with a proper prefix and add a small cleanup. This fix applies only for legacy rq. Fixes: 1b1e4868836a ("net/mlx5e: Use query_special_contexts for mkeys") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05bpf, sockmap: Incorrectly handling copied_seqJohn Fastabend
[ Upstream commit e5c6de5fa025882babf89cecbed80acf49b987fa ] The read_skb() logic is incrementing the tcp->copied_seq which is used for among other things calculating how many outstanding bytes can be read by the application. This results in application errors, if the application does an ioctl(FIONREAD) we return zero because this is calculated from the copied_seq value. To fix this we move tcp->copied_seq accounting into the recv handler so that we update these when the recvmsg() hook is called and data is in fact copied into user buffers. This gives an accurate FIONREAD value as expected and improves ACK handling. Before we were calling the tcp_rcv_space_adjust() which would update 'number of bytes copied to user in last RTT' which is wrong for programs returning SK_PASS. The bytes are only copied to the user when recvmsg is handled. Doing the fix for recvmsg is straightforward, but fixing redirect and SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then call this from skmsg handlers. This fixes another issue where a broken socket with a BPF program doing a resubmit could hang the receiver. This happened because although read_skb() consumed the skb through sock_drop() it did not update the copied_seq. Now if a single reccv socket is redirecting to many sockets (for example for lb) the receiver sk will be hung even though we might expect it to continue. The hang comes from not updating the copied_seq numbers and memory pressure resulting from that. We have a slight layer problem of calling tcp_eat_skb even if its not a TCP socket. To fix we could refactor and create per type receiver handlers. I decided this is more work than we want in the fix and we already have some small tweaks depending on caller that use the helper skb_bpf_strparser(). So we extend that a bit and always set the strparser bit when it is in use and then we can gate the seq_copied updates on this. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-9-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05bpf, sockmap: Improved check for empty queueJohn Fastabend
[ Upstream commit 405df89dd52cbcd69a3cd7d9a10d64de38f854b2 ] We noticed some rare sk_buffs were stepping past the queue when system was under memory pressure. The general theory is to skip enqueueing sk_buffs when its not necessary which is the normal case with a system that is properly provisioned for the task, no memory pressure and enough cpu assigned. But, if we can't allocate memory due to an ENOMEM error when enqueueing the sk_buff into the sockmap receive queue we push it onto a delayed workqueue to retry later. When a new sk_buff is received we then check if that queue is empty. However, there is a problem with simply checking the queue length. When a sk_buff is being processed from the ingress queue but not yet on the sockmap msg receive queue its possible to also recv a sk_buff through normal path. It will check the ingress queue which is zero and then skip ahead of the pkt being processed. Previously we used sock lock from both contexts which made the problem harder to hit, but not impossible. To fix instead of popping the skb from the queue entirely we peek the skb from the queue and do the copy there. This ensures checks to the queue length are non-zero while skb is being processed. Then finally when the entire skb has been copied to user space queue or another socket we pop it off the queue. This way the queue length check allows bypassing the queue only after the list has been completely processed. To reproduce issue we run NGINX compliance test with sockmap running and observe some flakes in our testing that we attributed to this issue. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-5-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05bpf, sockmap: Convert schedule_work into delayed_workJohn Fastabend
[ Upstream commit 29173d07f79883ac94f5570294f98af3d4287382 ] Sk_buffs are fed into sockmap verdict programs either from a strparser (when the user might want to decide how framing of skb is done by attaching another parser program) or directly through tcp_read_sock. The tcp_read_sock is the preferred method for performance when the BPF logic is a stream parser. The flow for Cilium's common use case with a stream parser is, tcp_read_sock() sk_psock_verdict_recv ret = bpf_prog_run_pin_on_cpu() sk_psock_verdict_apply(sock, skb, ret) // if system is under memory pressure or app is slow we may // need to queue skb. Do this queuing through ingress_skb and // then kick timer to wake up handler skb_queue_tail(ingress_skb, skb) schedule_work(work); The work queue is wired up to sk_psock_backlog(). This will then walk the ingress_skb skb list that holds our sk_buffs that could not be handled, but should be OK to run at some later point. However, its possible that the workqueue doing this work still hits an error when sending the skb. When this happens the skbuff is requeued on a temporary 'state' struct kept with the workqueue. This is necessary because its possible to partially send an skbuff before hitting an error and we need to know how and where to restart when the workqueue runs next. Now for the trouble, we don't rekick the workqueue. This can cause a stall where the skbuff we just cached on the state variable might never be sent. This happens when its the last packet in a flow and no further packets come along that would cause the system to kick the workqueue from that side. To fix we could do simple schedule_work(), but while under memory pressure it makes sense to back off some instead of continue to retry repeatedly. So instead to fix convert schedule_work to schedule_delayed_work and add backoff logic to reschedule from backlog queue on errors. Its not obvious though what a good backoff is so use '1'. To test we observed some flakes whil running NGINX compliance test with sockmap we attributed these failed test to this bug and subsequent issue. >From on list discussion. This commit bec217197b41("skmsg: Schedule psock work if the cached skb exists on the psock") was intended to address similar race, but had a couple cases it missed. Most obvious it only accounted for receiving traffic on the local socket so if redirecting into another socket we could still get an sk_buff stuck here. Next it missed the case where copied=0 in the recv() handler and then we wouldn't kick the scheduler. Also its sub-optimal to require userspace to kick the internal mechanisms of sockmap to wake it up and copy data to user. It results in an extra syscall and requires the app to actual handle the EAGAIN correctly. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-3-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05tls: rx: strp: preserve decryption status of skbs when neededJakub Kicinski
[ Upstream commit eca9bfafee3a0487e59c59201ae14c7594ba940a ] When receive buffer is small we try to copy out the data from TCP into a skb maintained by TLS to prevent connection from stalling. Unfortunately if a single record is made up of a mix of decrypted and non-decrypted skbs combining them into a single skb leads to loss of decryption status, resulting in decryption errors or data corruption. Similarly when trying to use TCP receive queue directly we need to make sure that all the skbs within the record have the same status. If we don't the mixed status will be detected correctly but we'll CoW the anchor, again collapsing it into a single paged skb without decrypted status preserved. So the "fixup" code will not know which parts of skb to re-encrypt. Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Shai Amiram <samiram@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-05tls: rx: strp: force mixed decrypted records into copy modeJakub Kicinski
[ Upstream commit 14c4be92ebb3e36e392aa9dd8f314038a9f96f3c ] If a record is partially decrypted we'll have to CoW it, anyway, so go into copy mode and allocate a writable skb right away. This will make subsequent fix simpler because we won't have to teach tls_strp_msg_make_copy() how to copy skbs while preserving decrypt status. Tested-by: Shai Amiram <samiram@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: eca9bfafee3a ("tls: rx: strp: preserve decryption status of skbs when needed") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30page_pool: fix inconsistency for page_pool_ring_[un]lock()Yunsheng Lin
commit 368d3cb406cdd074d1df2ad9ec06d1bfcb664882 upstream. page_pool_ring_[un]lock() use in_softirq() to decide which spin lock variant to use, and when they are called in the context with in_softirq() being false, spin_lock_bh() is called in page_pool_ring_lock() while spin_unlock() is called in page_pool_ring_unlock(), because spin_lock_bh() has disabled the softirq in page_pool_ring_lock(), which causes inconsistency for spin lock pair calling. This patch fixes it by returning in_softirq state from page_pool_producer_lock(), and use it to decide which spin lock variant to use in page_pool_producer_unlock(). As pool->ring has both producer and consumer lock, so rename it to page_pool_producer_[un]lock() to reflect the actual usage. Also move them to page_pool.c as they are only used there, and remove the 'inline' as the compiler may have better idea to do inlining or not. Fixes: 7886244736a4 ("net: page_pool: Add bulk support for ptr_ring") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/r/20230522031714.5089-1-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30net/mlx5: DR, Check force-loopback RC QP capability independently from RoCEYevgeny Kliteynik
commit c7dd225bc224726c22db08e680bf787f60ebdee3 upstream. SW Steering uses RC QP for writing STEs to ICM. This writingis done in LB (loopback), and FL (force-loopback) QP is preferred for performance. FL is available when RoCE is enabled or disabled based on RoCE caps. This patch adds reading of FL capability from HCA caps in addition to the existing reading from RoCE caps, thus fixing the case where we didn't have loopback enabled when RoCE was disabled. Fixes: 7304d603a57a ("net/mlx5: DR, Add support for force-loopback QP") Signed-off-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30ASoC: Intel: Skylake: Fix declaration of enum skl_ch_cfgCezary Rojewski
commit 95109657471311601b98e71f03d0244f48dc61bb upstream. Constant 'C4_CHANNEL' does not exist on the firmware side. Value 0xC is reserved for 'C7_1' instead. Fixes: 04afbbbb1cba ("ASoC: Intel: Skylake: Update the topology interface structure") Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> Link: https://lore.kernel.org/r/20230519201711.4073845-4-amadeuszx.slawinski@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30x86/pci/xen: populate MSI sysfs entriesMaximilian Heyne
commit 335b4223466dd75f9f3ea4918187afbadd22e5c8 upstream. Commit bf5e758f02fc ("genirq/msi: Simplify sysfs handling") reworked the creation of sysfs entries for MSI IRQs. The creation used to be in msi_domain_alloc_irqs_descs_locked after calling ops->domain_alloc_irqs. Then it moved into __msi_domain_alloc_irqs which is an implementation of domain_alloc_irqs. However, Xen comes with the only other implementation of domain_alloc_irqs and hence doesn't run the sysfs population code anymore. Commit 6c796996ee70 ("x86/pci/xen: Fixup fallout from the PCI/MSI overhaul") set the flag MSI_FLAG_DEV_SYSFS for the xen msi_domain_info but that doesn't actually have an effect because Xen uses it's own domain_alloc_irqs implementation. Fix this by making use of the fallback functions for sysfs population. Fixes: bf5e758f02fc ("genirq/msi: Simplify sysfs handling") Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230503131656.15928-1-mheyne@amazon.de Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30fs: fix undefined behavior in bit shift for SB_NOUSERHao Ge
commit f15afbd34d8fadbd375f1212e97837e32bc170cc upstream. Shifting signed 32-bit value by 31 bits is undefined, so changing significant bit to unsigned. It was spotted by UBSAN. So let's just fix this by using the BIT() helper for all SB_* flags. Fixes: e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags") Signed-off-by: Hao Ge <gehao@kylinos.cn> Message-Id: <20230424051835.374204-1-gehao@kylinos.cn> [brauner@kernel.org: use BIT() for all SB_* flags] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30firmware: arm_ffa: Fix FFA device names for logical partitionsSudeep Holla
commit 19b8766459c41c6f318f8a548cc1c66dffd18363 upstream. Each physical partition can provide multiple services each with UUID. Each such service can be presented as logical partition with a unique combination of VM ID and UUID. The number of distinct UUID in a system will be less than or equal to the number of logical partitions. However, currently it fails to register more than one logical partition or service within a physical partition as the device name contains only VM ID while both VM ID and UUID are maintained in the partition information. The kernel complains with the below message: | sysfs: cannot create duplicate filename '/devices/arm-ffa-8001' | CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc7 #8 | Hardware name: FVP Base RevC (DT) | Call trace: | dump_backtrace+0xf8/0x118 | show_stack+0x18/0x24 | dump_stack_lvl+0x50/0x68 | dump_stack+0x18/0x24 | sysfs_create_dir_ns+0xe0/0x13c | kobject_add_internal+0x220/0x3d4 | kobject_add+0x94/0x100 | device_add+0x144/0x5d8 | device_register+0x20/0x30 | ffa_device_register+0x88/0xd8 | ffa_setup_partitions+0x108/0x1b8 | ffa_init+0x2ec/0x3a4 | do_one_initcall+0xcc/0x240 | do_initcall_level+0x8c/0xac | do_initcalls+0x54/0x94 | do_basic_setup+0x1c/0x28 | kernel_init_freeable+0x100/0x16c | kernel_init+0x20/0x1a0 | ret_from_fork+0x10/0x20 | kobject_add_internal failed for arm-ffa-8001 with -EEXIST, don't try to | register things with the same name in the same directory. | arm_ffa arm-ffa: unable to register device arm-ffa-8001 err=-17 | ARM FF-A: ffa_setup_partitions: failed to register partition ID 0x8001 By virtue of being random enough to avoid collisions when generated in a distributed system, there is no way to compress UUID keys to the number of bits required to identify each. We can eliminate '-' in the name but it is not worth eliminating 4 bytes and add unnecessary logic for doing that. Also v1.0 doesn't provide the UUID of the partitions which makes it hard to use the same for the device name. So to keep it simple, let us alloc an ID using ida_alloc() and append the same to "arm-ffa" to make up a unique device name. Also stash the id value in ffa_dev to help freeing the ID later when the device is destroyed. Fixes: e781858488b9 ("firmware: arm_ffa: Add initial FFA bus support for device enumeration") Reported-by: Lucian Paul-Trifu <lucian.paul-trifu@arm.com> Link: https://lore.kernel.org/r/20230419-ffa_fixes_6-4-v2-3-d9108e43a176@arm.com Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30power: supply: bq27xxx: Ensure power_supply_changed() is called on current ↵Hans de Goede
sign changes commit 939a116142012926e25de0ea6b7e2f8d86a5f1b6 upstream. On gauges where the current register is signed, there is no charging flag in the flags register. So only checking flags will not result in power_supply_changed() getting called when e.g. a charger is plugged in and the current sign changes from negative (discharging) to positive (charging). This causes userspace's notion of the status to lag until userspace does a poll. And when a power_supply_leds.c LED trigger is used to indicate charging status with a LED, this LED will lag until the capacity percentage changes, which may take many minutes (because the LED trigger only is updated on power_supply_changed() calls). Fix this by calling bq27xxx_battery_current_and_status() on gauges with a signed current register and checking if the status has changed. Fixes: 297a533b3e62 ("bq27x00: Cache battery registers") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30power: supply: bq27xxx: Fix poll_interval handling and races on removeHans de Goede
commit c00bc80462afc7963f449d7f21d896d2f629cacc upstream. Before this patch bq27xxx_battery_teardown() was setting poll_interval = 0 to avoid bq27xxx_battery_update() requeuing the delayed_work item. There are 2 problems with this: 1. If the driver is unbound through sysfs, rather then the module being rmmod-ed, this changes poll_interval unexpectedly 2. This is racy, after it being set poll_interval could be changed before bq27xxx_battery_update() checks it through /sys/module/bq27xxx_battery/parameters/poll_interval Fix this by added a removed attribute to struct bq27xxx_device_info and using that instead of setting poll_interval to 0. There also is another poll_interval related race on remove(), writing /sys/module/bq27xxx_battery/parameters/poll_interval will requeue the delayed_work item for all devices on the bq27xxx_battery_devices list and the device being removed was only removed from that list after cancelling the delayed_work item. Fix this by moving the removal from the bq27xxx_battery_devices list to before cancelling the delayed_work item. Fixes: 8cfaaa811894 ("bq27x00_battery: Fix OOPS caused by unregistring bq27x00 driver") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30drm: fix drmm_mutex_init()Matthew Auld
commit c21f11d182c2180d8b90eaff84f574cfa845b250 upstream. In mutex_init() lockdep identifies a lock by defining a special static key for each lock class. However if we wrap the macro in a function, like in drmm_mutex_init(), we end up generating: int drmm_mutex_init(struct drm_device *dev, struct mutex *lock) { static struct lock_class_key __key; __mutex_init((lock), "lock", &__key); .... } The static __key here is what lockdep uses to identify the lock class, however since this is just a normal function the key here will be created once, where all callers then use the same key. In effect the mutex->depmap.key will be the same pointer for different drmm_mutex_init() callers. This then results in impossible lockdep splats since lockdep thinks completely unrelated locks are the same lock class. To fix this turn drmm_mutex_init() into a macro such that it generates a different "static struct lock_class_key __key" for each invocation, which looks to be inline with what mutex_init() wants. v2: - Revamp the commit message with clearer explanation of the issue. - Rather export __drmm_mutex_release() than static inline. Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reported-by: Sarah Walker <sarah.walker@imgtec.com> Fixes: e13f13e039dc ("drm: Add DRM-managed mutex_init()") Cc: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230519090733.489019-1-matthew.auld@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30USB: core: Add routines for endpoint checks in old driversAlan Stern
commit 13890626501ffda22b18213ddaf7930473da5792 upstream. Many of the older USB drivers in the Linux USB stack were written based simply on a vendor's device specification. They use the endpoint information in the spec and assume these endpoints will always be present, with the properties listed, in any device matching the given vendor and product IDs. While that may have been true back then, with spoofing and fuzzing it is not true any more. More and more we are finding that those old drivers need to perform at least a minimum of checking before they try to use any endpoint other than ep0. To make this checking as simple as possible, we now add a couple of utility routines to the USB core. usb_check_bulk_endpoints() and usb_check_int_endpoints() take an interface pointer together with a list of endpoint addresses (numbers and directions). They check that the interface's current alternate setting includes endpoints with those addresses and that each of these endpoints has the right type: bulk or interrupt, respectively. Although we already have usb_find_common_endpoints() and related routines meant for a similar purpose, they are not well suited for this kind of checking. Those routines find endpoints of various kinds, but only one (either the first or the last) of each kind, and they don't verify that the endpoints' addresses agree with what the caller expects. In theory the new routines could be more general: They could take a particular altsetting as their argument instead of always using the interface's current altsetting. In practice I think this won't matter too much; multiple altsettings tend to be used for transferring media (audio or visual) over isochronous endpoints, not bulk or interrupt. Drivers for such devices will generally require more sophisticated checking than these simplistic routines provide. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/dd2c8e8c-2c87-44ea-ba17-c64b97e201c9@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30net: fix stack overflow when LRO is disabled for virtual interfacesTaehee Yoo
commit ae9b15fbe63447bc1d3bba3769f409d17ca6fdf6 upstream. When the virtual interface's feature is updated, it synchronizes the updated feature for its own lower interface. This propagation logic should be worked as the iteration, not recursively. But it works recursively due to the netdev notification unexpectedly. This problem occurs when it disables LRO only for the team and bonding interface type. team0 | +------+------+-----+-----+ | | | | | team1 team2 team3 ... team200 If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE event to its own lower interfaces(team1 ~ team200). It is worked by netdev_sync_lower_features(). So, the NETDEV_FEAT_CHANGE notification logic of each lower interface work iteratively. But generated NETDEV_FEAT_CHANGE event is also sent to the upper interface too. upper interface(team0) generates the NETDEV_FEAT_CHANGE event for its own lower interfaces again. lower and upper interfaces receive this event and generate this event again and again. So, the stack overflow occurs. But it is not the infinite loop issue. Because the netdev_sync_lower_features() updates features before generating the NETDEV_FEAT_CHANGE event. Already synchronized lower interfaces skip notification logic. So, it is just the problem that iteration logic is changed to the recursive unexpectedly due to the notification mechanism. Reproducer: ip link add team0 type team ethtool -K team0 lro on for i in {1..200} do ip link add team$i master team0 type team ethtool -K team$i lro on done ethtool -K team0 lro off In order to fix it, the notifier_ctx member of bonding/team is introduced. Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20230517143010.3596250-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30ipv{4,6}/raw: fix output xfrm lookup wrt protocolNicolas Dichtel
commit 3632679d9e4f879f49949bb5b050e0de553e4739 upstream. With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the protocol field of the flow structure, build by raw_sendmsg() / rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy lookup when some policies are defined with a protocol in the selector. For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to specify the protocol. Just accept all values for IPPROTO_RAW socket. For ipv4, the sin_port field of 'struct sockaddr_in' could not be used without breaking backward compatibility (the value of this field was never checked). Let's add a new kind of control message, so that the userland could specify which protocol is used. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") CC: stable@vger.kernel.org Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30mm/vmemmap/devdax: fix kernel crash when probing devdax devicesAneesh Kumar K.V
commit 87a7ae75d7383afa998f57656d1d14e2a730cc47 upstream. commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") added support for using optimized vmmemap for devdax devices. But how vmemmap mappings are created are architecture specific. For example, powerpc with hash translation doesn't have vmemmap mappings in init_mm page table instead they are bolted table entries in the hardware page table vmemmap_populate_compound_pages() used by vmemmap optimization code is not aware of these architecture-specific mapping. Hence allow architecture to opt for this feature. I selected architectures supporting HUGETLB_PAGE_OPTIMIZE_VMEMMAP option as also supporting this feature. This patch fixes the below crash on ppc64. BUG: Unable to handle kernel data access on write at 0xc00c000100400038 Faulting instruction address: 0xc000000001269d90 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc5-150500.34-default+ #2 5c90a668b6bbd142599890245c2fb5de19d7d28a Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.40 (VL950_099) hv:phyp pSeries NIP: c000000001269d90 LR: c0000000004c57d4 CTR: 0000000000000000 REGS: c000000003632c30 TRAP: 0300 Not tainted (6.3.0-rc5-150500.34-default+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24842228 XER: 00000000 CFAR: c0000000004c57d0 DAR: c00c000100400038 DSISR: 42000000 IRQMASK: 0 .... NIP [c000000001269d90] __init_single_page.isra.74+0x14/0x4c LR [c0000000004c57d4] __init_zone_device_page+0x44/0xd0 Call Trace: [c000000003632ed0] [c000000003632f60] 0xc000000003632f60 (unreliable) [c000000003632f10] [c0000000004c5ca0] memmap_init_zone_device+0x170/0x250 [c000000003632fe0] [c0000000005575f8] memremap_pages+0x2c8/0x7f0 [c0000000036330c0] [c000000000557b5c] devm_memremap_pages+0x3c/0xa0 [c000000003633100] [c000000000d458a8] dev_dax_probe+0x108/0x3e0 [c0000000036331a0] [c000000000d41430] dax_bus_probe+0xb0/0x140 [c0000000036331d0] [c000000000cef27c] really_probe+0x19c/0x520 [c000000003633260] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c0000000036332e0] [c000000000cef888] driver_probe_device+0x58/0x120 [c000000003633320] [c000000000cefa6c] __device_attach_driver+0x11c/0x1e0 [c0000000036333a0] [c000000000cebc58] bus_for_each_drv+0xa8/0x130 [c000000003633400] [c000000000ceefcc] __device_attach+0x15c/0x250 [c0000000036334a0] [c000000000ced458] bus_probe_device+0x108/0x110 [c0000000036334f0] [c000000000ce92dc] device_add+0x7fc/0xa10 [c0000000036335b0] [c000000000d447c8] devm_create_dev_dax+0x1d8/0x530 [c000000003633640] [c000000000d46b60] __dax_pmem_probe+0x200/0x270 [c0000000036337b0] [c000000000d46bf0] dax_pmem_probe+0x20/0x70 [c0000000036337d0] [c000000000d2279c] nvdimm_bus_probe+0xac/0x2b0 [c000000003633860] [c000000000cef27c] really_probe+0x19c/0x520 [c0000000036338f0] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c000000003633970] [c000000000cef888] driver_probe_device+0x58/0x120 [c0000000036339b0] [c000000000cefd08] __driver_attach+0x1d8/0x240 [c000000003633a30] [c000000000cebb04] bus_for_each_dev+0xb4/0x130 [c000000003633a90] [c000000000cee564] driver_attach+0x34/0x50 [c000000003633ab0] [c000000000ced878] bus_add_driver+0x218/0x300 [c000000003633b40] [c000000000cf1144] driver_register+0xa4/0x1b0 [c000000003633bb0] [c000000000d21a0c] __nd_driver_register+0x5c/0x100 [c000000003633c10] [c00000000206a2e8] dax_pmem_init+0x34/0x48 [c000000003633c30] [c0000000000132d0] do_one_initcall+0x60/0x320 [c000000003633d00] [c0000000020051b0] kernel_init_freeable+0x360/0x400 [c000000003633de0] [c000000000013764] kernel_init+0x34/0x1d0 [c000000003633e50] [c00000000000de14] ret_from_kernel_thread+0x5c/0x64 Link: https://lkml.kernel.org/r/20230411142214.64464-1-aneesh.kumar@linux.ibm.com Fixes: 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Tarun Sahu <tsahu@linux.ibm.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Piyush Sachdeva <piyushs@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-30tpm: Prevent hwrng from activating during resumeJarkko Sakkinen
[ Upstream commit 99d46450625590d410f86fe4660a5eff7d3b8343 ] Set TPM_CHIP_FLAG_SUSPENDED in tpm_pm_suspend() and reset in tpm_pm_resume(). While the flag is set, tpm_hwrng() gives back zero bytes. This prevents hwrng from racing during resume. Cc: stable@vger.kernel.org Fixes: 6e592a065d51 ("tpm: Move Linux RNG connection to hwrng") Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-30tpm: Re-enable TPM chip boostrapping non-tpm_tis TPM driversJarkko Sakkinen
[ Upstream commit 0c8862de05c1a087795ee0a87bf61a6394306cc0 ] TPM chip bootstrapping was removed from tpm_chip_register(), and it was relocated to tpm_tis_core. This breaks all drivers which are not based on tpm_tis because the chip will not get properly initialized. Take the corrective steps: 1. Rename tpm_chip_startup() as tpm_chip_bootstrap() and make it one-shot. 2. Call tpm_chip_bootstrap() in tpm_chip_register(), which reverts the things as tehy used to be. Cc: Lino Sanfilippo <l.sanfilippo@kunbus.com> Fixes: 548eb516ec0f ("tpm, tpm_tis: startup chip before testing for interrupts") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Link: https://lore.kernel.org/all/ZEjqhwHWBnxcaRV5@xpf.sh.intel.com/ Tested-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Stable-dep-of: 99d464506255 ("tpm: Prevent hwrng from activating during resume") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24SUNRPC: always free ctxt when freeing deferred requestNeilBrown
[ Upstream commit 948f072ada23e0a504c5e4d7d71d4c83bd0785ec ] Since the ->xprt_ctxt pointer was added to svc_deferred_req, it has not been sufficient to use kfree() to free a deferred request. We may need to free the ctxt as well. As freeing the ctxt is all that ->xpo_release_rqst() does, we repurpose it to explicit do that even when the ctxt is not stored in an rqst. So we now have ->xpo_release_ctxt() which is given an xprt and a ctxt, which may have been taken either from an rqst or from a dreq. The caller is now responsible for clearing that pointer after the call to ->xpo_release_ctxt. We also clear dr->xprt_ctxt when the ctxt is moved into a new rqst when revisiting a deferred request. This ensures there is only one pointer to the ctxt, so the risk of double freeing in future is reduced. The new code in svc_xprt_release which releases both the ctxt and any rq_deferred depends on this. Fixes: 773f91b2cf3f ("SUNRPC: Fix NFSD's request deferral on RDMA transports") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24sched: Fix KCSAN noinstr violationJosh Poimboeuf
[ Upstream commit e0b081d17a9f4e5c0cbb0e5fbeb1abe3de0f7e4e ] With KCSAN enabled, end_of_stack() can get out-of-lined. Force it inline. Fixes the following warnings: vmlinux.o: warning: objtool: check_stackleak_irqoff+0x2b: call to end_of_stack() leaves .noinstr.text section Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/cc1b4d73d3a428a00d206242a68fdf99a934ca7b.1681320026.git.jpoimboe@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24Bluetooth: Add new quirk for broken set random RPA timeout for ATS2851Raul Cheleguini
[ Upstream commit 91b6d02ddcd113352bdd895990b252065c596de7 ] The ATS2851 based controller advertises support for command "LE Set Random Private Address Timeout" but does not actually implement it, impeding the controller initialization. Add the quirk HCI_QUIRK_BROKEN_SET_RPA_TIMEOUT to unblock the controller initialization. < HCI Command: LE Set Resolvable Private... (0x08|0x002e) plen 2 Timeout: 900 seconds > HCI Event: Command Status (0x0f) plen 4 LE Set Resolvable Private Address Timeout (0x08|0x002e) ncmd 1 Status: Unknown HCI Command (0x01) Co-developed-by: imoc <wzj9912@gmail.com> Signed-off-by: imoc <wzj9912@gmail.com> Signed-off-by: Raul Cheleguini <raul.cheleguini@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24Bluetooth: Add new quirk for broken local ext features page 2Vasily Khoruzhick
[ Upstream commit 8194f1ef5a815aea815a91daf2c721eab2674f1f ] Some adapters (e.g. RTL8723CS) advertise that they have more than 2 pages for local ext features, but they don't support any features declared in these pages. RTL8723CS reports max_page = 2 and declares support for sync train and secure connection, but it responds with either garbage or with error in status on corresponding commands. Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Bastian Germann <bage@debian.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24ipvs: Update width of source for ip_vs_sync_conn_optionsSimon Horman
[ Upstream commit e3478c68f6704638d08f437cbc552ca5970c151a ] In ip_vs_sync_conn_v0() copy is made to struct ip_vs_sync_conn_options. That structure looks like this: struct ip_vs_sync_conn_options { struct ip_vs_seq in_seq; struct ip_vs_seq out_seq; }; The source of the copy is the in_seq field of struct ip_vs_conn. Whose type is struct ip_vs_seq. Thus we can see that the source - is not as wide as the amount of data copied, which is the width of struct ip_vs_sync_conn_option. The copy is safe because the next field in is another struct ip_vs_seq. Make use of struct_group() to annotate this. Flagged by gcc-13 as: In file included from ./include/linux/string.h:254, from ./include/linux/bitmap.h:11, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/paravirt.h:17, from ./arch/x86/include/asm/cpuid.h:62, from ./arch/x86/include/asm/processor.h:19, from ./arch/x86/include/asm/timex.h:5, from ./include/linux/timex.h:67, from ./include/linux/time32.h:13, from ./include/linux/time.h:60, from ./include/linux/stat.h:19, from ./include/linux/module.h:13, from net/netfilter/ipvs/ip_vs_sync.c:38: In function 'fortify_memcpy_chk', inlined from 'ip_vs_sync_conn_v0' at net/netfilter/ipvs/ip_vs_sync.c:606:3: ./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 529 | __read_overflow2_field(q_size_field, size); | Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24net/sched: pass netlink extack to mqprio and taprio offloadVladimir Oltean
[ Upstream commit c54876cd5961ce0f8e74807f79a6739cd6b35ddf ] With the multiplexed ndo_setup_tc() model which lacks a first-class struct netlink_ext_ack * argument, the only way to pass the netlink extended ACK message down to the device driver is to embed it within the offload structure. Do this for struct tc_mqprio_qopt_offload and struct tc_taprio_qopt_offload. Since struct tc_taprio_qopt_offload also contains a tc_mqprio_qopt_offload structure, and since device drivers might effectively reuse their mqprio implementation for the mqprio portion of taprio, we make taprio set the extack in both offload structures to point at the same netlink extack message. In fact, the taprio handling is a bit more tricky, for 2 reasons. First is because the offload structure has a longer lifetime than the extack structure. The driver is supposed to populate the extack synchronously from ndo_setup_tc() and leave it alone afterwards. To not have any use-after-free surprises, we zero out the extack pointer when we leave taprio_enable_offload(). The second reason is because taprio does overwrite the extack message on ndo_setup_tc() error. We need to switch to the weak form of setting an extack message, which preserves a potential message set by the driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24netdev: Enforce index cap in netdev_get_tx_queueNick Child
[ Upstream commit 1cc6571f562774f1d928dc8b3cff50829b86e970 ] When requesting a TX queue at a given index, warn on out-of-bounds referencing if the index is greater than the allocated number of queues. Specifically, since this function is used heavily in the networking stack use DEBUG_NET_WARN_ON_ONCE to avoid executing a new branch on every packet. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://lore.kernel.org/r/20230321150725.127229-2-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4Shanker Donthineni
[ Upstream commit 35727af2b15d98a2dd2811d631d3a3886111312e ] The T241 platform suffers from the T241-FABRIC-4 erratum which causes unexpected behavior in the GIC when multiple transactions are received simultaneously from different sources. This hardware issue impacts NVIDIA server platforms that use more than two T241 chips interconnected. Each chip has support for 320 {E}SPIs. This issue occurs when multiple packets from different GICs are incorrectly interleaved at the target chip. The erratum text below specifies exactly what can cause multiple transfer packets susceptible to interleaving and GIC state corruption. GIC state corruption can lead to a range of problems, including kernel panics, and unexpected behavior. >From the erratum text: "In some cases, inter-socket AXI4 Stream packets with multiple transfers, may be interleaved by the fabric when presented to ARM Generic Interrupt Controller. GIC expects all transfers of a packet to be delivered without any interleaving. The following GICv3 commands may result in multiple transfer packets over inter-socket AXI4 Stream interface: - Register reads from GICD_I* and GICD_N* - Register writes to 64-bit GICD registers other than GICD_IROUTERn* - ITS command MOVALL Multiple commands in GICv4+ utilize multiple transfer packets, including VMOVP, VMOVI, VMAPP, and 64-bit register accesses." This issue impacts system configurations with more than 2 sockets, that require multi-transfer packets to be sent over inter-socket AXI4 Stream interface between GIC instances on different sockets. GICv4 cannot be supported. GICv3 SW model can only be supported with the workaround. Single and Dual socket configurations are not impacted by this issue and support GICv3 and GICv4." Link: https://developer.nvidia.com/docs/t241-fabric-4/nvidia-t241-fabric-4-errata.pdf Writing to the chip alias region of the GICD_In{E} registers except GICD_ICENABLERn has an equivalent effect as writing to the global distributor. The SPI interrupt deactivate path is not impacted by the erratum. To fix this problem, implement a workaround that ensures read accesses to the GICD_In{E} registers are directed to the chip that owns the SPI, and disable GICv4.x features. To simplify code changes, the gic_configure_irq() function uses the same alias region for both read and write operations to GICD_ICFGR. Co-developed-by: Vikram Sethi <vsethi@nvidia.com> Signed-off-by: Vikram Sethi <vsethi@nvidia.com> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> (for SMCCC/SOC ID bits) Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230319024314.3540573-2-sdonthineni@nvidia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24firmware: arm_sdei: Fix sleep from invalid context BUGPierre Gondois
[ Upstream commit d2c48b2387eb89e0bf2a2e06e30987cf410acad4 ] Running a preempt-rt (v6.2-rc3-rt1) based kernel on an Ampere Altra triggers: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 24, name: cpuhp/0 preempt_count: 0, expected: 0 RCU nest depth: 0, expected: 0 3 locks held by cpuhp/0/24: #0: ffffda30217c70d0 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248 #1: ffffda30217c7120 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x5c/0x248 #2: ffffda3021c711f0 (sdei_list_lock){....}-{3:3}, at: sdei_cpuhp_up+0x3c/0x130 irq event stamp: 36 hardirqs last enabled at (35): [<ffffda301e85b7bc>] finish_task_switch+0xb4/0x2b0 hardirqs last disabled at (36): [<ffffda301e812fec>] cpuhp_thread_fun+0x21c/0x248 softirqs last enabled at (0): [<ffffda301e80b184>] copy_process+0x63c/0x1ac0 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 0 PID: 24 Comm: cpuhp/0 Not tainted 5.19.0-rc3-rt5-[...] Hardware name: WIWYNN Mt.Jade Server [...] Call trace: dump_backtrace+0x114/0x120 show_stack+0x20/0x70 dump_stack_lvl+0x9c/0xd8 dump_stack+0x18/0x34 __might_resched+0x188/0x228 rt_spin_lock+0x70/0x120 sdei_cpuhp_up+0x3c/0x130 cpuhp_invoke_callback+0x250/0xf08 cpuhp_thread_fun+0x120/0x248 smpboot_thread_fn+0x280/0x320 kthread+0x130/0x140 ret_from_fork+0x10/0x20 sdei_cpuhp_up() is called in the STARTING hotplug section, which runs with interrupts disabled. Use a CPUHP_AP_ONLINE_DYN entry instead to execute the cpuhp cb later, with preemption enabled. SDEI originally got its own cpuhp slot to allow interacting with perf. It got superseded by pNMI and this early slot is not relevant anymore. [1] Some SDEI calls (e.g. SDEI_1_0_FN_SDEI_PE_MASK) take actions on the calling CPU. It is checked that preemption is disabled for them. _ONLINE cpuhp cb are executed in the 'per CPU hotplug thread'. Preemption is enabled in those threads, but their cpumask is limited to 1 CPU. Move 'WARN_ON_ONCE(preemptible())' statements so that SDEI cpuhp cb don't trigger them. Also add a check for the SDEI_1_0_FN_SDEI_PRIVATE_RESET SDEI call which acts on the calling CPU. [1]: https://lore.kernel.org/all/5813b8c5-ae3e-87fd-fccc-94c9cd08816d@arm.com/ Suggested-by: James Morse <james.morse@arm.com> Signed-off-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20230216084920.144064-1-pierre.gondois@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24open: return EINVAL for O_DIRECTORY | O_CREATChristian Brauner
[ Upstream commit 43b450632676fb60e9faeddff285d9fac94a4f58 ] After a couple of years and multiple LTS releases we received a report that the behavior of O_DIRECTORY | O_CREAT changed starting with v5.7. On kernels prior to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL had the following semantics: (1) open("/tmp/d", O_DIRECTORY | O_CREAT) * d doesn't exist: create regular file * d exists and is a regular file: ENOTDIR * d exists and is a directory: EISDIR (2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL) * d doesn't exist: create regular file * d exists and is a regular file: EEXIST * d exists and is a directory: EEXIST (3) open("/tmp/d", O_DIRECTORY | O_EXCL) * d doesn't exist: ENOENT * d exists and is a regular file: ENOTDIR * d exists and is a directory: open directory On kernels since to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL have the following semantics: (1) open("/tmp/d", O_DIRECTORY | O_CREAT) * d doesn't exist: ENOTDIR (create regular file) * d exists and is a regular file: ENOTDIR * d exists and is a directory: EISDIR (2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL) * d doesn't exist: ENOTDIR (create regular file) * d exists and is a regular file: EEXIST * d exists and is a directory: EEXIST (3) open("/tmp/d", O_DIRECTORY | O_EXCL) * d doesn't exist: ENOENT * d exists and is a regular file: ENOTDIR * d exists and is a directory: open directory This is a fairly substantial semantic change that userspace didn't notice until Pedro took the time to deliberately figure out corner cases. Since no one noticed this breakage we can somewhat safely assume that O_DIRECTORY | O_CREAT combinations are likely unused. The v5.7 breakage is especially weird because while ENOTDIR is returned indicating failure a regular file is actually created. This doesn't make a lot of sense. Time was spent finding potential users of this combination. Searching on codesearch.debian.net showed that codebases often express semantical expectations about O_DIRECTORY | O_CREAT which are completely contrary to what our code has done and currently does. The expectation often is that this particular combination would create and open a directory. This suggests users who tried to use that combination would stumble upon the counterintuitive behavior no matter if pre-v5.7 or post v5.7 and quickly realize neither semantics give them what they want. For some examples see the code examples in [1] to [3] and the discussion in [4]. There are various ways to address this issue. The lazy/simple option would be to restore the pre-v5.7 behavior and to just live with that bug forever. But since there's a real chance that the O_DIRECTORY | O_CREAT quirk isn't relied upon we should try to get away with murder(ing bad semantics) first. If we need to Frankenstein pre-v5.7 behavior later so be it. So let's simply return EINVAL categorically for O_DIRECTORY | O_CREAT combinations. In addition to cleaning up the old bug this also opens up the possiblity to make that flag combination do something more intuitive in the future. Starting with this commit the following semantics apply: (1) open("/tmp/d", O_DIRECTORY | O_CREAT) * d doesn't exist: EINVAL * d exists and is a regular file: EINVAL * d exists and is a directory: EINVAL (2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL) * d doesn't exist: EINVAL * d exists and is a regular file: EINVAL * d exists and is a directory: EINVAL (3) open("/tmp/d", O_DIRECTORY | O_EXCL) * d doesn't exist: ENOENT * d exists and is a regular file: ENOTDIR * d exists and is a directory: open directory One additional note, O_TMPFILE is implemented as: #define __O_TMPFILE 020000000 #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY) #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT) For older kernels it was important to return an explicit error when O_TMPFILE wasn't supported. So O_TMPFILE requires that O_DIRECTORY is raised alongside __O_TMPFILE. It also enforced that O_CREAT wasn't specified. Since O_DIRECTORY | O_CREAT could be used to create a regular allowing that combination together with __O_TMPFILE would've meant that false positives were possible, i.e., that a regular file was created instead of a O_TMPFILE. This could've been used to trick userspace into thinking it operated on a O_TMPFILE when it wasn't. Now that we block O_DIRECTORY | O_CREAT completely the check for O_CREAT in the __O_TMPFILE branch via if ((flags & O_TMPFILE_MASK) != O_TMPFILE) can be dropped. Instead we can simply check verify that O_DIRECTORY is raised via if (!(flags & O_DIRECTORY)) and explain this in two comments. As Aleksa pointed out O_PATH is unaffected by this change since it always returned EINVAL if O_CREAT was specified - with or without O_DIRECTORY. Link: https://lore.kernel.org/lkml/20230320071442.172228-1-pedro.falcato@gmail.com Link: https://sources.debian.org/src/flatpak/1.14.4-1/subprojects/libglnx/glnx-dirfd.c/?hl=324#L324 [1] Link: https://sources.debian.org/src/flatpak-builder/1.2.3-1/subprojects/libglnx/glnx-shutil.c/?hl=251#L251 [2] Link: https://sources.debian.org/src/ostree/2022.7-2/libglnx/glnx-dirfd.c/?hl=324#L324 [3] Link: https://www.openwall.com/lists/oss-security/2014/11/26/14 [4] Reported-by: Pedro Falcato <pedro.falcato@gmail.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24net: add vlan_get_protocol_and_depth() helperEric Dumazet
[ Upstream commit 4063384ef762cc5946fc7a3f89879e76c6ec51e2 ] Before blamed commit, pskb_may_pull() was used instead of skb_header_pointer() in __vlan_get_protocol() and friends. Few callers depended on skb->head being populated with MAC header, syzbot caught one of them (skb_mac_gso_segment()) Add vlan_get_protocol_and_depth() to make the intent clearer and use it where sensible. This is a more generic fix than commit e9d3f80935b6 ("net/af_packet: make sure to pull mac header") which was dealing with a similar issue. kernel BUG at include/linux/skbuff.h:2655 ! invalid opcode: 0000 [#1] SMP KASAN CPU: 0 PID: 1441 Comm: syz-executor199 Not tainted 6.1.24-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023 RIP: 0010:__skb_pull include/linux/skbuff.h:2655 [inline] RIP: 0010:skb_mac_gso_segment+0x68f/0x6a0 net/core/gro.c:136 Code: fd 48 8b 5c 24 10 44 89 6b 70 48 c7 c7 c0 ae 0d 86 44 89 e6 e8 a1 91 d0 00 48 c7 c7 00 af 0d 86 48 89 de 31 d2 e8 d1 4a e9 ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 RSP: 0018:ffffc90001bd7520 EFLAGS: 00010286 RAX: ffffffff8469736a RBX: ffff88810f31dac0 RCX: ffff888115a18b00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90001bd75e8 R08: ffffffff84697183 R09: fffff5200037adf9 R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000012 R13: 000000000000fee5 R14: 0000000000005865 R15: 000000000000fed7 FS: 000055555633f300(0000) GS:ffff8881f6a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 0000000116fea000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> [<ffffffff847018dd>] __skb_gso_segment+0x32d/0x4c0 net/core/dev.c:3419 [<ffffffff8470398a>] skb_gso_segment include/linux/netdevice.h:4819 [inline] [<ffffffff8470398a>] validate_xmit_skb+0x3aa/0xee0 net/core/dev.c:3725 [<ffffffff84707042>] __dev_queue_xmit+0x1332/0x3300 net/core/dev.c:4313 [<ffffffff851a9ec7>] dev_queue_xmit+0x17/0x20 include/linux/netdevice.h:3029 [<ffffffff851b4a82>] packet_snd net/packet/af_packet.c:3111 [inline] [<ffffffff851b4a82>] packet_sendmsg+0x49d2/0x6470 net/packet/af_packet.c:3142 [<ffffffff84669a12>] sock_sendmsg_nosec net/socket.c:716 [inline] [<ffffffff84669a12>] sock_sendmsg net/socket.c:736 [inline] [<ffffffff84669a12>] __sys_sendto+0x472/0x5f0 net/socket.c:2139 [<ffffffff84669c75>] __do_sys_sendto net/socket.c:2151 [inline] [<ffffffff84669c75>] __se_sys_sendto net/socket.c:2147 [inline] [<ffffffff84669c75>] __x64_sys_sendto+0xe5/0x100 net/socket.c:2147 [<ffffffff8551d40f>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff8551d40f>] do_syscall_64+0x2f/0x50 arch/x86/entry/common.c:80 [<ffffffff85600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 469aceddfa3e ("vlan: consolidate VLAN parsing code and limit max parsing depth") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24bonding: fix send_peer_notif overflowHangbin Liu
[ Upstream commit 9949e2efb54eb3001cb2f6512ff3166dddbfb75d ] Bonding send_peer_notif was defined as u8. Since commit 07a4ddec3ce9 ("bonding: add an option to specify a delay between peer notifications"). the bond->send_peer_notif will be num_peer_notif multiplied by peer_notif_delay, which is u8 * u32. This would cause the send_peer_notif overflow easily. e.g. ip link add bond0 type bond mode 1 miimon 100 num_grat_arp 30 peer_notify_delay 1000 To fix the overflow, let's set the send_peer_notif to u32 and limit peer_notif_delay to 300s. Reported-by: Liang Li <liali@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2090053 Fixes: 07a4ddec3ce9 ("bonding: add an option to specify a delay between peer notifications") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24net: Fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().Kuniyuki Iwashima
[ Upstream commit dfd9248c071a3710c24365897459538551cb7167 ] KCSAN found a data race in sock_recv_cmsgs() where the read access to sk->sk_stamp needs READ_ONCE(). BUG: KCSAN: data-race in packet_recvmsg / packet_recvmsg write (marked) to 0xffff88803c81f258 of 8 bytes by task 19171 on cpu 0: sock_write_timestamp include/net/sock.h:2670 [inline] sock_recv_cmsgs include/net/sock.h:2722 [inline] packet_recvmsg+0xb97/0xd00 net/packet/af_packet.c:3489 sock_recvmsg_nosec net/socket.c:1019 [inline] sock_recvmsg+0x11a/0x130 net/socket.c:1040 sock_read_iter+0x176/0x220 net/socket.c:1118 call_read_iter include/linux/fs.h:1845 [inline] new_sync_read fs/read_write.c:389 [inline] vfs_read+0x5e0/0x630 fs/read_write.c:470 ksys_read+0x163/0x1a0 fs/read_write.c:613 __do_sys_read fs/read_write.c:623 [inline] __se_sys_read fs/read_write.c:621 [inline] __x64_sys_read+0x41/0x50 fs/read_write.c:621 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc read to 0xffff88803c81f258 of 8 bytes by task 19183 on cpu 1: sock_recv_cmsgs include/net/sock.h:2721 [inline] packet_recvmsg+0xb64/0xd00 net/packet/af_packet.c:3489 sock_recvmsg_nosec net/socket.c:1019 [inline] sock_recvmsg+0x11a/0x130 net/socket.c:1040 sock_read_iter+0x176/0x220 net/socket.c:1118 call_read_iter include/linux/fs.h:1845 [inline] new_sync_read fs/read_write.c:389 [inline] vfs_read+0x5e0/0x630 fs/read_write.c:470 ksys_read+0x163/0x1a0 fs/read_write.c:613 __do_sys_read fs/read_write.c:623 [inline] __se_sys_read fs/read_write.c:621 [inline] __x64_sys_read+0x41/0x50 fs/read_write.c:621 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc value changed: 0xffffffffc4653600 -> 0x0000000000000000 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 19183 Comm: syz-executor.5 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Fixes: 6c7c98bad488 ("sock: avoid dirtying sk_stamp, if possible") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230508175543.55756-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24linux/dim: Do nothing if no time delta between samplesRoy Novich
[ Upstream commit 162bd18eb55adf464a0fa2b4144b8d61c75ff7c2 ] Add return value for dim_calc_stats. This is an indication for the caller if curr_stats was assigned by the function. Avoid using curr_stats uninitialized over {rdma/net}_dim, when no time delta between samples. Coverity reported this potential use of an uninitialized variable. Fixes: 4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux") Fixes: cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing") Signed-off-by: Roy Novich <royno@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://lore.kernel.org/r/20230507135743.138993-1-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-24drm/dsc: fix DP_DSC_MAX_BPP_DELTA_* macro valuesJani Nikula
[ Upstream commit 0d68683838f2850dd8ff31f1121e05bfb7a2def0 ] The macro values just don't match the specs. Fix them. Fixes: 1482ec00be4a ("drm: Add missing DP DSC extended capability definitions.") Cc: Vinod Govindapillai <vinod.govindapillai@intel.com> Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230406134615.1422509-2-jani.nikula@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17x86/amd_nb: Add PCI ID for family 19h model 78hMario Limonciello
commit 23a5b8bb022c1e071ca91b1a9c10f0ad6a0966e9 upstream. Commit 310e782a99c7 ("platform/x86/amd: pmc: Utilize SMN index 0 for driver probe") switched to using amd_smn_read() which relies upon the misc PCI ID used by DF function 3 being included in a table. The ID for model 78h is missing in that table, so amd_smn_read() doesn't work. Add the missing ID into amd_nb, restoring s2idle on this system. [ bp: Simplify commit message. ] Fixes: 310e782a99c7 ("platform/x86/amd: pmc: Utilize SMN index 0 for driver probe") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h Acked-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20230427053338.16653-2-mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17drm/dsc: fix drm_edp_dsc_sink_output_bpp() DPCD high byte usageJani Nikula
commit 13525645e2246ebc8a21bd656248d86022a6ee8f upstream. The operator precedence between << and & is wrong, leading to the high byte being completely ignored. For example, with the 6.4 format, 32 becomes 0 and 24 becomes 8. Fix it, and remove the slightly confusing and unnecessary DP_DSC_MAX_BITS_PER_PIXEL_HI_SHIFT macro while at it. Fixes: 0575650077ea ("drm/dp: DRM DP helper/macros to get DP sink DSC parameters") Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Cc: Manasi Navare <navaremanasi@google.com> Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Cc: <stable@vger.kernel.org> # v5.0+ Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230406134615.1422509-1-jani.nikula@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-05-17i2c: gxp: fix build failure without CONFIG_I2C_SLAVEArnd Bergmann
[ Upstream commit 5d388143fa6c351d985ffd23ea50c91c8839141b ] The gxp_i2c_slave_irq_handler() is hidden in an #ifdef, but the caller uses an IS_ENABLED() check: drivers/i2c/busses/i2c-gxp.c: In function 'gxp_i2c_irq_handler': drivers/i2c/busses/i2c-gxp.c:467:29: error: implicit declaration of function 'gxp_i2c_slave_irq_handler'; did you mean 'gxp_i2c_irq_handler'? [-Werror=implicit-function-declaration] It has to consistently use one method or the other to avoid warnings, so move to IS_ENABLED() here for readability and build coverage, and move the #ifdef in linux/i2c.h to allow building it as dead code. Fixes: 4a55ed6f89f5 ("i2c: Add GXP SoC I2C Controller") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Hawkins <nick.hawkins@hpe.com> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17netfilter: nf_tables: support for adding new devices to an existing netdev chainPablo Neira Ayuso
[ Upstream commit b9703ed44ffbfba85c103b9de01886a225e14b38 ] This patch allows users to add devices to an existing netdev chain. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Stable-dep-of: 8509f62b0b07 ("netfilter: nf_tables: hit ENOENT on unexisting chain/flowtable update with missing attributes") Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-17rxrpc: Fix timeout of a call that hasn't yet been granted a channelDavid Howells
[ Upstream commit db099c625b13a74d462521a46d98a8ce5b53af5d ] afs_make_call() calls rxrpc_kernel_begin_call() to begin a call (which may get stalled in the background waiting for a connection to become available); it then calls rxrpc_kernel_set_max_life() to set the timeouts - but that starts the call timer so the call timer might then expire before we get a connection assigned - leading to the following oops if the call stalled: BUG: kernel NULL pointer dereference, address: 0000000000000000 ... CPU: 1 PID: 5111 Comm: krxrpcio/0 Not tainted 6.3.0-rc7-build3+ #701 RIP: 0010:rxrpc_alloc_txbuf+0xc0/0x157 ... Call Trace: <TASK> rxrpc_send_ACK+0x50/0x13b rxrpc_input_call_event+0x16a/0x67d rxrpc_io_thread+0x1b6/0x45f ? _raw_spin_unlock_irqrestore+0x1f/0x35 ? rxrpc_input_packet+0x519/0x519 kthread+0xe7/0xef ? kthread_complete_and_exit+0x1b/0x1b ret_from_fork+0x22/0x30 Fix this by noting the timeouts in struct rxrpc_call when the call is created. The timer will be started when the first packet is transmitted. It shouldn't be possible to trigger this directly from userspace through AF_RXRPC as sendmsg() will return EBUSY if the call is in the waiting-for-conn state if it dropped out of the wait due to a signal. Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>