| Age | Commit message (Collapse) | Author |
|
Coverity (ID: 1226842) reported that the return value of
btrfs_discard_extent() is assigned to ret but is immediately
overwritten by unpin_extent_range() without being checked.
Use the same error handling that is done later in the same function.
Signed-off-by: Jingkai Tan <contact@jingk.ai>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Fix netfslib such that when it's making an unbuffered or DIO write, to make
sure that it sends each subrequest strictly sequentially, waiting till the
previous one is 'committed' before sending the next so that we don't have
pieces landing out of order and potentially leaving a hole if an error
occurs (ENOSPC for example).
This is done by copying in just those bits of issuing, collecting and
retrying subrequests that are necessary to do one subrequest at a time.
Retrying, in particular, is simpler because if the current subrequest needs
retrying, the source iterator can just be copied again and the subrequest
prepped and issued again without needing to be concerned about whether it
needs merging with the previous or next in the sequence.
Note that the issuing loop waits for a subrequest to complete right after
issuing it, but this wait could be moved elsewhere allowing preparatory
steps to be performed whilst the subrequest is in progress. In particular,
once content encryption is available in netfslib, that could be done whilst
waiting, as could cleanup of buffers that have been completed.
Fixes: 153a9961b551 ("netfs: Implement unbuffered/DIO write support")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://patch.msgid.link/58526.1772112753@warthog.procyon.org.uk
Tested-by: Steve French <sfrench@samba.org>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since commit af37511305c0 ("firmware: cs_dsp: Don't require client to
provide a struct cs_dsp_client_ops") the client doesn't have to provide
a struct cs_dsp_client_ops. So remove the dummy cs_dsp_client_ops.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://patch.msgid.link/20260226124115.1811187-1-rf@opensource.cirrus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v7.0
One quirk and a fix for handling of exotic peripherals on cs42l43.
|
|
In decode_choice(), the boundary check before get_len() uses the
variable `len`, which is still 0 from its initialization at the top of
the function:
unsigned int type, ext, len = 0;
...
if (ext || (son->attr & OPEN)) {
BYTE_ALIGN(bs);
if (nf_h323_error_boundary(bs, len, 0)) /* len is 0 here */
return H323_ERROR_BOUND;
len = get_len(bs); /* OOB read */
When the bitstream is exactly consumed (bs->cur == bs->end), the check
nf_h323_error_boundary(bs, 0, 0) evaluates to (bs->cur + 0 > bs->end),
which is false. The subsequent get_len() call then dereferences
*bs->cur++, reading 1 byte past the end of the buffer. If that byte
has bit 7 set, get_len() reads a second byte as well.
This can be triggered remotely by sending a crafted Q.931 SETUP message
with a User-User Information Element containing exactly 2 bytes of
PER-encoded data ({0x08, 0x00}) to port 1720 through a firewall with
the nf_conntrack_h323 helper active. The decoder fully consumes the
PER buffer before reaching this code path, resulting in a 1-2 byte
heap-buffer-overflow read confirmed by AddressSanitizer.
Fix this by checking for 2 bytes (the maximum that get_len() may read)
instead of the uninitialized `len`. This matches the pattern used at
every other get_len() call site in the same file, where the caller
checks for 2 bytes of available data before calling get_len().
Fixes: ec8a8f3c31dd ("netfilter: nf_ct_h323: Extend nf_h323_error_boundary to work on bits as well")
Signed-off-by: Vahagn Vardanian <vahagn@redrays.io>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://patch.msgid.link/20260225130619.1248-2-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In cs35l56_hda_posture_put() assign ucontrol->value.integer.value[0] to
a long instead of an unsigned long. ucontrol->value.integer.value[0] is
a long.
This fixes the sparse warning:
sound/hda/codecs/side-codecs/cs35l56_hda.c:256:20: warning: unsigned value
that used to be signed checked against zero?
sound/hda/codecs/side-codecs/cs35l56_hda.c:252:29: signed value source
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 73cfbfa9caea8 ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
Link: https://patch.msgid.link/20260226111728.1700431-1-rf@opensource.cirrus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
The driver obtains sw_attr.num_ifs from firmware via dpsw_get_attributes()
but never validates it against DPSW_MAX_IF (64). This value controls
iteration in dpaa2_switch_fdb_get_flood_cfg(), which writes port indices
into the fixed-size cfg->if_id[DPSW_MAX_IF] array. When firmware reports
num_ifs >= 64, the loop can write past the array bounds.
Add a bound check for num_ifs in dpaa2_switch_init().
dpaa2_switch_fdb_get_flood_cfg() appends the control interface (port
num_ifs) after all matched ports. When num_ifs == DPSW_MAX_IF and all
ports match the flood filter, the loop fills all 64 slots and the control
interface write overflows by one entry.
The check uses >= because num_ifs == DPSW_MAX_IF is also functionally
broken.
build_if_id_bitmap() silently drops any ID >= 64:
if (id[i] < DPSW_MAX_IF)
bmap[id[i] / 64] |= ...
Fixes: 539dda3c5d19 ("staging: dpaa2-switch: properly setup switching domains")
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/SYBPR01MB78812B47B7F0470B617C408AAF74A@SYBPR01MB7881.ausprd01.prod.outlook.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
udpgro_frglist.sh and udpgro_bench.sh are the flakiest tests
currently in NIPA. They fail in the same exact way, TCP GRO
test stalls occasionally and the test gets killed after 10min.
These tests use veth to simulate GRO. They attach a trivial
("return XDP_PASS;") XDP program to the veth to force TSO off
and NAPI on.
Digging into the failure mode we can see that the connection
is completely stuck after a burst of drops. The sender's snd_nxt
is at sequence number N [1], but the receiver claims to have
received (rcv_nxt) up to N + 3 * MSS [2]. Last piece of the puzzle
is that senders rtx queue is not empty (let's say the block in
the rtx queue is at sequence number N - 4 * MSS [3]).
In this state, sender sends a retransmission from the rtx queue
with a single segment, and sequence numbers N-4*MSS:N-3*MSS [3].
Receiver sees it and responds with an ACK all the way up to
N + 3 * MSS [2]. But sender will reject this ack as TCP_ACK_UNSENT_DATA
because it has no recollection of ever sending data that far out [1].
And we are stuck.
The root cause is the mess of the xmit return codes. veth returns
an error when it can't xmit a frame. We end up with a loss event
like this:
-------------------------------------------------
| GSO super frame 1 | GSO super frame 2 |
|-----------------------------------------------|
| seg | seg | seg | seg | seg | seg | seg | seg |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
-------------------------------------------------
x ok ok <ok>| ok ok ok <x>
\\
snd_nxt
"x" means packet lost by veth, and "ok" means it went thru.
Since veth has TSO disabled in this test it sees individual segments.
Segment 1 is on the retransmit queue and will be resent.
So why did the sender not advance snd_nxt even tho it clearly did
send up to seg 8? tcp_write_xmit() interprets the return code
from the core to mean that data has not been sent at all. Since
TCP deals with GSO super frames, not individual segment the crux
of the problem is that loss of a single segment can be interpreted
as loss of all. TCP only sees the last return code for the last
segment of the GSO frame (in <> brackets in the diagram above).
Of course for the problem to occur we need a setup or a device
without a Qdisc. Otherwise Qdisc layer disconnects the protocol
layer from the device errors completely.
We have multiple ways to fix this.
1) make veth not return an error when it lost a packet.
While this is what I think we did in the past, the issue keeps
reappearing and it's annoying to debug. The game of whack
a mole is not great.
2) fix the damn return codes
We only talk about NETDEV_TX_OK and NETDEV_TX_BUSY in the
documentation, so maybe we should make the return code from
ndo_start_xmit() a boolean. I like that the most, but perhaps
some ancient, not-really-networking protocol would suffer.
3) make TCP ignore the errors
It is not entirely clear to me what benefit TCP gets from
interpreting the result of ip_queue_xmit()? Specifically once
the connection is established and we're pushing data - packet
loss is just packet loss?
4) this fix
Ignore the rc in the Qdisc-less+GSO case, since it's unreliable.
We already always return OK in the TCQ_F_CAN_BYPASS case.
In the Qdisc-less case let's be a bit more conservative and only
mask the GSO errors. This path is taken by non-IP-"networks"
like CAN, MCTP etc, so we could regress some ancient thing.
This is the simplest, but also maybe the hackiest fix?
Similar fix has been proposed by Eric in the past but never committed
because original reporter was working with an OOT driver and wasn't
providing feedback (see Link).
Link: https://lore.kernel.org/CANn89iJcLepEin7EtBETrZ36bjoD9LrR=k4cfwWh046GB+4f9A@mail.gmail.com
Fixes: 1f59533f9ca5 ("qdisc: validate frames going through the direct_xmit path")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260223235100.108939-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Bobby Eshleman says:
====================
vsock: add write-once semantics to child_ns_mode
Two administrator processes may race when setting child_ns_mode: one
sets it to "local" and creates a namespace, but another changes it to
"global" in between. The first process ends up with a namespace in the
wrong mode. Make child_ns_mode write-once so that a namespace manager
can set it once, check the value, and be guaranteed it won't change
before creating its namespaces. Writing a different value after the
first write returns -EBUSY.
One patch for the implementation, one for docs, and one for tests.
v2: https://lore.kernel.org/r/20260218-vsock-ns-write-once-v2-0-19e4c50d509a@meta.com
v1: https://lore.kernel.org/r/20260217-vsock-ns-write-once-v1-1-a1fb30f289a9@meta.com
====================
Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-0-c0cde6959923@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Update the vsock child_ns_mode documentation to include the new
write-once semantics of setting child_ns_mode. The semantics are
implemented in a preceding patch in this series.
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-3-c0cde6959923@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Two administrator processes may race when setting child_ns_mode as one
process sets child_ns_mode to "local" and then creates a namespace, but
another process changes child_ns_mode to "global" between the write and
the namespace creation. The first process ends up with a namespace in
"global" mode instead of "local". While this can be detected after the
fact by reading ns_mode and retrying, it is fragile and error-prone.
Make child_ns_mode write-once so that a namespace manager can set it
once and be sure it won't change. Writing a different value after the
first write returns -EBUSY. This applies to all namespaces, including
init_net, where an init process can write "local" to lock all future
namespaces into local mode.
Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
Suggested-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Co-developed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-2-c0cde6959923@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The child_ns_mode sysctl parameter becomes write-once in a future patch
in this series, which breaks existing tests. This patch updates the
tests to respect this new policy. No additional tests are added.
Add "global-parent" and "local-parent" namespaces as intermediaries to
spawn namespaces in the given modes. This avoids the need to change
"child_ns_mode" in the init_ns. nsenter must be used because ip netns
unshares the mount namespace so nested "ip netns add" breaks exec calls
from the init ns. Adds nsenter to the deps check.
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
Link: https://patch.msgid.link/20260223-vsock-ns-write-once-v3-1-c0cde6959923@meta.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Fix the following compilation error:
ERROR: modpost: module ib_uverbs uses symbol dma_buf_move_notify
from namespace DMA_BUF, but does not import it.
Fixes: 0ac6f4056c4a ("RDMA/uverbs: Add DMABUF object type and operations")
Link: https://patch.msgid.link/20260225-fix-uverbs-compilation-v1-1-acf7b3d0f9fa@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Guillaume reported crashes via corrupted RCU callback function pointers
during KUnit testing. The crash was traced back to the pidfs rhashtable
conversion which replaced the 24-byte rb_node with an 8-byte rhash_head
in struct pid, shrinking it from 160 to 144 bytes.
struct kthread (without CONFIG_BLK_CGROUP) is also 144 bytes. With
CONFIG_SLAB_MERGE_DEFAULT and SLAB_HWCACHE_ALIGN both round up to
192 bytes and share the same slab cache. struct pid.rcu.func and
struct kthread.affinity_node both sit at offset 0x78.
When a kthread exits via make_task_dead() it bypasses kthread_exit() and
misses the affinity_node cleanup. free_kthread_struct() frees the memory
while the node is still linked into the global kthread_affinity_list. A
subsequent list_del() by another kthread writes through dangling list
pointers into the freed and reused memory, corrupting the pid's
rcu.func pointer.
Instead of patching free_kthread_struct() to handle the missed cleanup,
consolidate all kthread exit paths. Turn kthread_exit() into a macro
that calls do_exit() and add kthread_do_exit() which is called from
do_exit() for any task with PF_KTHREAD set. This guarantees that
kthread-specific cleanup always happens regardless of the exit path -
make_task_dead(), direct do_exit(), or kthread_exit().
Replace __to_kthread() with a new tsk_is_kthread() accessor in the
public header. Export do_exit() since module code using the
kthread_exit() macro now needs it directly.
Reported-by: Guillaume Tucker <gtucker@gtucker.io>
Tested-by: Guillaume Tucker <gtucker@gtucker.io>
Tested-by: Mark Brown <broonie@kernel.org>
Tested-by: David Gow <davidgow@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/all/20260224-mittlerweile-besessen-2738831ae7f6@brauner
Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 4d13f4304fa4 ("kthread: Implement preferred affinity")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
iomap's directio implementation has two magic errno codes that it uses
to signal callers -- ENOTBLK tells the filesystem that it should retry
a write with the pagecache; and EAGAIN tells the caller that pagecache
flushing or invalidation failed and that it should try again.
Neither of these indicate data loss, so let's not report them.
Fixes: a9d573ee88af98 ("iomap: report file I/O errors to the VFS")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://patch.msgid.link/20260224154637.GD2390381@frogsfrogsfrogs
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The entry of the validators table for UAC3 AC header descriptor is
defined with the wrong protocol version UAC_VERSION_2, while it should
have been UAC_VERSION_3. This results in the validator never matching
for actual UAC3 devices (protocol == UAC_VERSION_3), causing their
header descriptors to bypass validation entirely. A malicious USB
device presenting a truncated UAC3 header could exploit this to cause
out-of-bounds reads when the driver later accesses unvalidated
descriptor fields.
The bug was introduced in the same commit as the recently fixed UAC3
feature unit sub-type typo, and appears to be from the same copy-paste
error when the UAC3 section was created from the UAC2 section.
Fixes: 57f8770620e9 ("ALSA: usb-audio: More validations of descriptor units")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jun Seo <jun.seo.93@proton.me>
Link: https://patch.msgid.link/20260226010820.36529-1-jun.seo.93@proton.me
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
fix mute/micmute LEDs and headset microphone for Acer Nitro ANV15-51.
[ The headset microphone issue is solved by Kailang]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220279
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Heng <zhangheng@kylinos.cn>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20260209134149.3076957-1-zhangheng@kylinos.cn
|
|
Tariq Toukan says:
====================
mlx5 misc fixes 2026-02-24
This patchset provides misc bug fixes from the team to the mlx5
core and Eth drivers.
====================
Link: https://patch.msgid.link/20260224114652.1787431-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix a "scheduling while atomic" bug in mlx5e_ipsec_init_macs() by
replacing mlx5_query_mac_address() with ether_addr_copy() to get the
local MAC address directly from netdev->dev_addr.
The issue occurs because mlx5_query_mac_address() queries the hardware
which involves mlx5_cmd_exec() that can sleep, but it is called from
the mlx5e_ipsec_handle_event workqueue which runs in atomic context.
The MAC address is already available in netdev->dev_addr, so no need
to query hardware. This avoids the sleeping call and resolves the bug.
Call trace:
BUG: scheduling while atomic: kworker/u112:2/69344/0x00000200
__schedule+0x7ab/0xa20
schedule+0x1c/0xb0
schedule_timeout+0x6e/0xf0
__wait_for_common+0x91/0x1b0
cmd_exec+0xa85/0xff0 [mlx5_core]
mlx5_cmd_exec+0x1f/0x50 [mlx5_core]
mlx5_query_nic_vport_mac_address+0x7b/0xd0 [mlx5_core]
mlx5_query_mac_address+0x19/0x30 [mlx5_core]
mlx5e_ipsec_init_macs+0xc1/0x720 [mlx5_core]
mlx5e_ipsec_build_accel_xfrm_attrs+0x422/0x670 [mlx5_core]
mlx5e_ipsec_handle_event+0x2b9/0x460 [mlx5_core]
process_one_work+0x178/0x2e0
worker_thread+0x2ea/0x430
Fixes: cee137a63431 ("net/mlx5e: Handle ESN update events")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260224114652.1787431-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The cited commit miss to add locking in the error path of
mlx5_sriov_enable(). When pci_enable_sriov() fails,
mlx5_device_disable_sriov() is called to clean up. This cleanup function
now expects to be called with the devlink instance lock held.
Add the missing devl_lock(devlink) and devl_unlock(devlink)
Fixes: 84a433a40d0e ("net/mlx5: Lock mlx5 devlink reload callbacks")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260224114652.1787431-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The cited commit introduced MLX5_PRIV_FLAGS_SWITCH_LEGACY to identify
when a transition to legacy mode is requested via devlink. However, the
logic failed to clear this flag if the mode was subsequently changed
back to MLX5_ESWITCH_OFFLOADS (switchdev). Consequently, if a user
toggled from legacy to switchdev, the flag remained set, leaving the
driver with wrong state indicating
Fix this by explicitly clearing the MLX5_PRIV_FLAGS_SWITCH_LEGACY bit
when the requested mode is MLX5_ESWITCH_OFFLOADS.
Fixes: 2a4f56fbcc47 ("net/mlx5e: Keep netdev when leave switchdev for devlink set legacy only")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260224114652.1787431-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlx5_lag_disable_change() unconditionally called mlx5_disable_lag() when
LAG was active, which is incorrect for MLX5_LAG_MODE_MPESW.
Hnece, call mlx5_disable_mpesw() when running in MPESW mode.
Fixes: a32327a3a02c ("net/mlx5: Lag, Control MultiPort E-Switch single FDB mode")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260224114652.1787431-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix a circular locking dependency between dbg_mutex and the domain
rx/tx mutexes that could lead to a deadlock.
The dump path in dr_dump_domain_all() was acquiring locks in the order:
dbg_mutex -> rx.mutex -> tx.mutex
While the table/matcher creation paths acquire locks in the order:
rx.mutex -> tx.mutex -> dbg_mutex
This inverted lock ordering creates a circular dependency. Fix this by
changing dr_dump_domain_all() to acquire the domain lock before
dbg_mutex, matching the order used in mlx5dr_table_create() and
mlx5dr_matcher_create().
Lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
6.19.0-rc6net_next_e817c4e #1 Not tainted
------------------------------------------------------
sos/30721 is trying to acquire lock:
ffff888102df5900 (&dmn->info.rx.mutex){+.+.}-{4:4}, at:
dr_dump_start+0x131/0x450 [mlx5_core]
but task is already holding lock:
ffff888102df5bc0 (&dmn->dump_info.dbg_mutex){+.+.}-{4:4}, at:
dr_dump_start+0x10b/0x450 [mlx5_core]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&dmn->dump_info.dbg_mutex){+.+.}-{4:4}:
__mutex_lock+0x91/0x1060
mlx5dr_matcher_create+0x377/0x5e0 [mlx5_core]
mlx5_cmd_dr_create_flow_group+0x62/0xd0 [mlx5_core]
mlx5_create_flow_group+0x113/0x1c0 [mlx5_core]
mlx5_chains_create_prio+0x453/0x2290 [mlx5_core]
mlx5_chains_get_table+0x2e2/0x980 [mlx5_core]
esw_chains_create+0x1e6/0x3b0 [mlx5_core]
esw_create_offloads_fdb_tables.cold+0x62/0x63f [mlx5_core]
esw_offloads_enable+0x76f/0xd20 [mlx5_core]
mlx5_eswitch_enable_locked+0x35a/0x500 [mlx5_core]
mlx5_devlink_eswitch_mode_set+0x561/0x950 [mlx5_core]
devlink_nl_eswitch_set_doit+0x67/0xe0
genl_family_rcv_msg_doit+0xe0/0x130
genl_rcv_msg+0x188/0x290
netlink_rcv_skb+0x4b/0xf0
genl_rcv+0x24/0x40
netlink_unicast+0x1ed/0x2c0
netlink_sendmsg+0x210/0x450
__sock_sendmsg+0x38/0x60
__sys_sendto+0x119/0x180
__x64_sys_sendto+0x20/0x30
do_syscall_64+0x70/0xd00
entry_SYSCALL_64_after_hwframe+0x4b/0x53
-> #1 (&dmn->info.tx.mutex){+.+.}-{4:4}:
__mutex_lock+0x91/0x1060
mlx5dr_table_create+0x11d/0x530 [mlx5_core]
mlx5_cmd_dr_create_flow_table+0x62/0x140 [mlx5_core]
__mlx5_create_flow_table+0x46f/0x960 [mlx5_core]
mlx5_create_flow_table+0x16/0x20 [mlx5_core]
esw_create_offloads_fdb_tables+0x136/0x240 [mlx5_core]
esw_offloads_enable+0x76f/0xd20 [mlx5_core]
mlx5_eswitch_enable_locked+0x35a/0x500 [mlx5_core]
mlx5_devlink_eswitch_mode_set+0x561/0x950 [mlx5_core]
devlink_nl_eswitch_set_doit+0x67/0xe0
genl_family_rcv_msg_doit+0xe0/0x130
genl_rcv_msg+0x188/0x290
netlink_rcv_skb+0x4b/0xf0
genl_rcv+0x24/0x40
netlink_unicast+0x1ed/0x2c0
netlink_sendmsg+0x210/0x450
__sock_sendmsg+0x38/0x60
__sys_sendto+0x119/0x180
__x64_sys_sendto+0x20/0x30
do_syscall_64+0x70/0xd00
entry_SYSCALL_64_after_hwframe+0x4b/0x53
-> #0 (&dmn->info.rx.mutex){+.+.}-{4:4}:
__lock_acquire+0x18b6/0x2eb0
lock_acquire+0xd3/0x2c0
__mutex_lock+0x91/0x1060
dr_dump_start+0x131/0x450 [mlx5_core]
seq_read_iter+0xe3/0x410
seq_read+0xfb/0x130
full_proxy_read+0x53/0x80
vfs_read+0xba/0x330
ksys_read+0x65/0xe0
do_syscall_64+0x70/0xd00
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dmn->dump_info.dbg_mutex);
lock(&dmn->info.tx.mutex);
lock(&dmn->dump_info.dbg_mutex);
lock(&dmn->info.rx.mutex);
*** DEADLOCK ***
Fixes: 9222f0b27da2 ("net/mlx5: DR, Add support for dumping steering info")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260224114652.1787431-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
A good number of fixes:
- cfg80211:
- cancel rfkill work appropriately
- fix radiotap parsing to correctly reject field 18
- fix wext (yes...) off-by-one for IGTK key ID
- mac80211:
- fix for mesh NULL pointer dereference
- fix for stack out-of-bounds (2 bytes) write on
specific multi-link action frames
- set default WMM parameters for all links
- mwifiex: check dev_alloc_name() return value correctly
- libertas: fix potential timer use-after-free
- brcmfmac: fix crash on probe failure
* tag 'wireless-2026-02-25' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mac80211: fix NULL pointer dereference in mesh_rx_csa_frame()
wifi: mac80211: bounds-check link_id in ieee80211_ml_reconfiguration
wifi: mac80211: set default WMM parameters on all links
wifi: libertas: fix use-after-free in lbs_free_adapter()
wifi: mwifiex: Fix dev_alloc_name() return value check
wifi: brcmfmac: Fix potential kernel oops when probe fails
wifi: radiotap: reject radiotap with unknown bits
wifi: cfg80211: cancel rfkill_block work in wiphy_unregister()
wifi: cfg80211: wext: fix IGTK key ID off-by-one
====================
Link: https://patch.msgid.link/20260225113159.360574-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ido Schimmel says:
====================
team: Fix reference count leak when changing port netns
Patch #1 fixes a reference count leak that was reported by syzkaller.
The leak happens when a net device that is member in a team is changing
netns. The fix is to align the team driver with the bond driver and have
it suppress NETDEV_CHANGEMTU events for a net device that is being
unregistered.
Without this change, the NETDEV_CHANGEMTU event causes inetdev_event()
to recreate an inet device for this net device in its original netns,
after it was previously destroyed upon NETDEV_UNREGISTER. Later on, when
inetdev_event() receives a NETDEV_REGISTER event for this net device in
the new nents, it simply leaks the reference:
case NETDEV_REGISTER:
pr_debug("%s: bug\n", __func__);
RCU_INIT_POINTER(dev->ip_ptr, NULL);
break;
addrconf_notify() handles this differently and reuses the existing inet6
device if one exists when a NETDEV_REGISTER event is received. This
creates a different problem where it is possible for a net device to
reference an inet6 device that was created in a previous netns.
A more generic fix that we can try in net-next is to revert the changes
in the bond and team drivers and instead have IPv4 and IPv6 destroy and
recreate an inet device if one already exists upon NETDEV_REGISTER.
Patch #2 adds a selftest that passes with the fix and hangs without it.
====================
Link: https://patch.msgid.link/20260224125709.317574-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a test for the issue that was fixed in "team: avoid NETDEV_CHANGEMTU
event when unregistering slave".
The test hangs due to a reference count leak without the fix:
# make -C tools/testing/selftests TARGETS="drivers/net/team" TEST_PROGS=refleak.sh TEST_GEN_PROGS="" run_tests
[...]
TAP version 13
1..1
# timeout set to 45
# selftests: drivers/net/team: refleak.sh
[ 50.681299][ T496] unregister_netdevice: waiting for dummy1 to become free. Usage count = 3
[ 71.185325][ T496] unregister_netdevice: waiting for dummy1 to become free. Usage count = 3
And passes with the fix:
# make -C tools/testing/selftests TARGETS="drivers/net/team" TEST_PROGS=refleak.sh TEST_GEN_PROGS="" run_tests
[...]
TAP version 13
1..1
# timeout set to 45
# selftests: drivers/net/team: refleak.sh
ok 1 selftests: drivers/net/team: refleak.sh
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260224125709.317574-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot is reporting
unregister_netdevice: waiting for netdevsim0 to become free. Usage count = 3
ref_tracker: netdev@ffff88807dcf8618 has 1/2 users at
__netdev_tracker_alloc include/linux/netdevice.h:4400 [inline]
netdev_hold include/linux/netdevice.h:4429 [inline]
inetdev_init+0x201/0x4e0 net/ipv4/devinet.c:286
inetdev_event+0x251/0x1610 net/ipv4/devinet.c:1600
notifier_call_chain+0x19d/0x3a0 kernel/notifier.c:85
call_netdevice_notifiers_mtu net/core/dev.c:2318 [inline]
netif_set_mtu_ext+0x5aa/0x800 net/core/dev.c:9886
netif_set_mtu+0xd7/0x1b0 net/core/dev.c:9907
dev_set_mtu+0x126/0x260 net/core/dev_api.c:248
team_port_del+0xb07/0xcb0 drivers/net/team/team_core.c:1333
team_del_slave drivers/net/team/team_core.c:1936 [inline]
team_device_event+0x207/0x5b0 drivers/net/team/team_core.c:2929
notifier_call_chain+0x19d/0x3a0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2281 [inline]
call_netdevice_notifiers net/core/dev.c:2295 [inline]
__dev_change_net_namespace+0xcb7/0x2050 net/core/dev.c:12592
do_setlink+0x2ce/0x4590 net/core/rtnetlink.c:3060
rtnl_changelink net/core/rtnetlink.c:3776 [inline]
__rtnl_newlink net/core/rtnetlink.c:3935 [inline]
rtnl_newlink+0x15a9/0x1be0 net/core/rtnetlink.c:4072
rtnetlink_rcv_msg+0x7d5/0xbe0 net/core/rtnetlink.c:6958
netlink_rcv_skb+0x232/0x4b0 net/netlink/af_netlink.c:2550
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x80f/0x9b0 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x813/0xb40 net/netlink/af_netlink.c:1894
problem. Ido Schimmel found steps to reproduce
ip link add name team1 type team
ip link add name dummy1 mtu 1499 master team1 type dummy
ip netns add ns1
ip link set dev dummy1 netns ns1
ip -n ns1 link del dev dummy1
and also found that the same issue was fixed in the bond driver in
commit f51048c3e07b ("bonding: avoid NETDEV_CHANGEMTU event when
unregistering slave").
Let's do similar thing for the team driver, with commit ad7c7b2172c3 ("net:
hold netdev instance lock during sysfs operations") and commit 303a8487a657
("net: s/__dev_set_mtu/__netif_set_mtu/") also applied.
Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Suggested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20260224125709.317574-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
While testing corner cases in the driver, a use-after-free crash
was found on the service rescan PCI path.
When mana_serv_reset() calls mana_gd_suspend(), mana_gd_cleanup()
destroys gc->service_wq. If the subsequent mana_gd_resume() fails
with -ETIMEDOUT or -EPROTO, the code falls through to
mana_serv_rescan() which triggers pci_stop_and_remove_bus_device().
This invokes the PCI .remove callback (mana_gd_remove), which calls
mana_gd_cleanup() a second time, attempting to destroy the already-
freed workqueue. Fix this by NULL-checking gc->service_wq in
mana_gd_cleanup() and setting it to NULL after destruction.
Call stack of issue for reference:
[Sat Feb 21 18:53:48 2026] Call Trace:
[Sat Feb 21 18:53:48 2026] <TASK>
[Sat Feb 21 18:53:48 2026] mana_gd_cleanup+0x33/0x70 [mana]
[Sat Feb 21 18:53:48 2026] mana_gd_remove+0x3a/0xc0 [mana]
[Sat Feb 21 18:53:48 2026] pci_device_remove+0x41/0xb0
[Sat Feb 21 18:53:48 2026] device_remove+0x46/0x70
[Sat Feb 21 18:53:48 2026] device_release_driver_internal+0x1e3/0x250
[Sat Feb 21 18:53:48 2026] device_release_driver+0x12/0x20
[Sat Feb 21 18:53:48 2026] pci_stop_bus_device+0x6a/0x90
[Sat Feb 21 18:53:48 2026] pci_stop_and_remove_bus_device+0x13/0x30
[Sat Feb 21 18:53:48 2026] mana_do_service+0x180/0x290 [mana]
[Sat Feb 21 18:53:48 2026] mana_serv_func+0x24/0x50 [mana]
[Sat Feb 21 18:53:48 2026] process_one_work+0x190/0x3d0
[Sat Feb 21 18:53:48 2026] worker_thread+0x16e/0x2e0
[Sat Feb 21 18:53:48 2026] kthread+0xf7/0x130
[Sat Feb 21 18:53:48 2026] ? __pfx_worker_thread+0x10/0x10
[Sat Feb 21 18:53:48 2026] ? __pfx_kthread+0x10/0x10
[Sat Feb 21 18:53:48 2026] ret_from_fork+0x269/0x350
[Sat Feb 21 18:53:48 2026] ? __pfx_kthread+0x10/0x10
[Sat Feb 21 18:53:48 2026] ret_from_fork_asm+0x1a/0x30
[Sat Feb 21 18:53:48 2026] </TASK>
Fixes: 505cc26bcae0 ("net: mana: Add support for auxiliary device servicing events")
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/aZ2bzL64NagfyHpg@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace Vinod Koul with Mohd Ayaan Anwar as the maintainer of the
QUALCOMM ETHQOS ETHERNET DRIVER. Vinod confirmed he is no longer
active in this area and agreed to be removed.
Acked-by: Vinod Koul <vkoul@kernel.org>
Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk>
Signed-off-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20260224-qcom_ethqos_maintainer-v1-1-24e02701ea52@oss.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The devm_add_action_or_reset() function already executes the cleanup
action on failure before returning an error, so the explicit goto error
and subsequent zl3073x_dev_dpll_fini() call causes double cleanup.
Fixes: ebb1031c5137 ("dpll: zl3073x: Refactor DPLL initialization")
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Link: https://patch.msgid.link/20260224-dpll-v2-1-d7786414a830@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Simon Baatz says:
====================
tcp: re-enable acceptance of FIN packets when RWIN is 0
this series restores the ability to accept in‑sequence FIN packets
even when the advertised receive window is zero, and adds a
packetdrill test to guard the behavior.
====================
Link: https://patch.msgid.link/20260224-fix_zero_wnd_fin-v2-0-a16677ea7cea@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a packetdrill test that verifies we accept bare FIN packets when
the advertised receive window is zero.
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260224-fix_zero_wnd_fin-v2-2-a16677ea7cea@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 2bd99aef1b19 ("tcp: accept bare FIN packets under memory
pressure") allowed accepting FIN packets in tcp_data_queue() even when
the receive window was closed, to prevent ACK/FIN loops with broken
clients.
Such a FIN packet is in sequence, but because the FIN consumes a
sequence number, it extends beyond the window. Before commit
9ca48d616ed7 ("tcp: do not accept packets beyond window"),
tcp_sequence() only required the seq to be within the window. After
that change, the entire packet (including the FIN) must fit within the
window. As a result, such FIN packets are now dropped and the handling
path is no longer reached.
Be more lenient by not counting the sequence number consumed by the
FIN when calling tcp_sequence(), restoring the previous behavior for
cases where only the FIN extends beyond the window.
Fixes: 9ca48d616ed7 ("tcp: do not accept packets beyond window")
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260224-fix_zero_wnd_fin-v2-1-a16677ea7cea@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
current->nsproxy is should not be accessed directly as syzbot has found
that it could be NULL at times, causing crashes. Fix up the af_vsock
sysctl handlers to use container_of() to deal with the current net
namespace instead of attempting to rely on current.
This is the same type of change done in commit 7f5611cbc487 ("rds:
sysctl: rds_tcp_{rcv,snd}buf: avoid using current->nsproxy")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bobby Eshleman <bobbyeshleman@meta.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
Link: https://patch.msgid.link/2026022318-rearview-gallery-ae13@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The kaweth driver should validate that the device it is probing has the
proper number and types of USB endpoints it is expecting before it binds
to it. If a malicious device were to not have the same urbs the driver
will crash later on when it blindly accesses these endpoints.
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://patch.msgid.link/2026022305-substance-virtual-c728@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The kalmia driver should validate that the device it is probing has the
proper number and types of USB endpoints it is expecting before it binds
to it. If a malicious device were to not have the same urbs the driver
will crash later on when it blindly accesses these endpoints.
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: d40261236e8e ("net/usb: Add Samsung Kalmia driver for Samsung GT-B3730")
Link: https://patch.msgid.link/2026022326-shack-headstone-ef6f@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The pegasus driver should validate that the device it is probing has the
proper number and types of USB endpoints it is expecting before it binds
to it. If a malicious device were to not have the same urbs the driver
will crash later on when it blindly accesses these endpoints.
Cc: Petko Manolov <petkan@nucleusys.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026022347-legibly-attest-cc5c@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the device is disconnected from the driver, there is a "dangling"
reference count on the usb interface that was grabbed in the probe
callback. Fix this up by properly dropping the reference after we are
done with it.
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Fixes: c46ee38620a2 ("NFC: pn533: add NXP pn533 nfc device driver")
Link: https://patch.msgid.link/2026022329-flashing-ought-7573@gregkh
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Do not share the page cache if the real @aops differs
- Fix the incomplete condition for interlaced plain extents
- Get rid of more unnecessary #ifdefs
* tag 'erofs-for-7.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix interlaced plain identification for encoded extents
erofs: remove more unnecessary #ifdefs
erofs: allow sharing page cache with the same aops only
|
|
scx_idle_node_masks is allocated with num_possible_nodes() elements but
indexed by NUMA node IDs via for_each_node(). On systems with
non-contiguous NUMA node numbering (e.g. nodes 0 and 4), node IDs can
exceed the array size, causing out-of-bounds memory corruption.
Use nr_node_ids instead, which represents the maximum node ID range and
is the correct size for arrays indexed by node ID.
Fixes: 7c60329e3521 ("sched_ext: Add NUMA-awareness to the default idle selection policy")
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
A workaround was introduced in commit 1fb710793ce2 ("drm/amdgpu: Enable
MES lr_compute_wa by default") to help with some hangs observed in gfx1151.
This WA didn't fully fix the issue. It was actually fixed by adjusting
the VGPR size to the correct value that matched the hardware in commit
b42f3bf9536c ("drm/amdkfd: bump minimum vgpr size for gfx1151").
There are reports of instability on other products with newer GC microcode
versions, and I believe they're caused by this workaround. As we don't
need the workaround any more, remove it.
Fixes: b42f3bf9536c ("drm/amdkfd: bump minimum vgpr size for gfx1151")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9973e64bd6ee7642860a6f3b6958cbf14e89cabd)
Cc: stable@vger.kernel.org
|
|
If the device has not recovered after slot reset is called, it goes to
out label for error handling. There it could make decision based on
uninitialized hive pointer and could result in accessing an uninitialized
list.
Initialize the list and hive properly so that it handles the error
situation and also releases the reset domain lock which is acquired
during error_detected callback.
Fixes: 732c6cefc1ec ("drm/amdgpu: Replace tmp_adev with hive in amdgpu_pci_slot_reset")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bb71362182e59caa227e4192da5a612b09349696)
|
|
This will set AMDGPU_VCN_SMU_DPM_INTERFACE_* smu_type
based on soc type and fixing ring timeout issue seen
for DPM enabled case.
Signed-off-by: sguttula <suresh.guttula@amd.com>
Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f0f23c315b38c55e8ce9484cf59b65811f350630)
|
|
Do not unlock psp->ras_context.mutex if it has not been locked. This has
been detected by the Clang thread-safety analyzer.
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: YiPeng Chai <YiPeng.Chai@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Fixes: b3fb79cda568 ("drm/amdgpu: add mutex to protect ras shared memory")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6fa01b4335978051d2cd80841728fd63cc597970)
|
|
Mutexes must be unlocked before these are destroyed. This has been detected
by the Clang thread-safety analyzer.
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Fixes: f5e4cc8461c4 ("drm/amdgpu: implement RAS ACA driver framework")
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 270258ba320beb99648dceffb67e86ac76786e55)
|
|
This can be called while preemption is disabled, for example by
dcn32_internal_validate_bw which is called with the FPU active.
Fixes "BUG: scheduling while atomic" messages I encounter on my Navi31
machine.
Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b42dae2ebc5c84a68de63ec4ffdfec49362d53f1)
Cc: stable@vger.kernel.org
|
|
Huge input values in amdgpu_userq_wait_ioctl can lead to a OOM and
could be exploited.
So check these input value against AMDGPU_USERQ_MAX_HANDLES
which is big enough value for genuine use cases and could
potentially avoid OOM.
v2: squash in Srini's fix
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fcec012c664247531aed3e662f4280ff804d1476)
Cc: stable@vger.kernel.org
|
|
Huge input values in amdgpu_userq_signal_ioctl can lead to a OOM and
could be exploited.
So check these input value against AMDGPU_USERQ_MAX_HANDLES
which is big enough value for genuine use cases and could
potentially avoid OOM.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit be267e15f99bc97cbe202cd556717797cdcf79a5)
Cc: stable@vger.kernel.org
|
|
Userspace can either deliberately pass in the too small num_fences, or the
required number can legitimately grow between the two calls to the userq
wait ioctl. In both cases we do not want the emit the kernel warning
backtrace since nothing is wrong with the kernel and userspace will simply
get an errno reported back. So lets simply drop the WARN_ONs.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL")
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2c333ea579de6cc20ea7bc50e9595ef72863e65c)
|
|
Drop reference to syncobj and timeline fence when aborting the ioctl due
output array being too small.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL")
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 68951e9c3e6bb22396bc42ef2359751c8315dd27)
Cc: <stable@vger.kernel.org> # v6.16+
|