summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)Author
11 daysdrm/amdgpu: Fix is_dpm_runningPratik Vishwakarma
Use multi args for get_enabled_mask to fix is_dpm_running Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amdgpu: Fix set_default_dpm_tablesPratik Vishwakarma
Use smu_v15_0_0_update_table instead of common api Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/admgpu: Update metrics_table for SMU15Pratik Vishwakarma
Use multi param based get op for metrics_table Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amdgpu: Add support for update_table for SMU15Pratik Vishwakarma
Add update_table for SMU 15_0_0 Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/swsmu: Add new param regs for SMU15Pratik Vishwakarma
Some SMU messages have changed to multi reg read/write Initialize during smu_early_init Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amdgpu: Load TA ucode for PSP 15_0_0Pratik Vishwakarma
TOC and TA both are required Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/pm: send unload command to smu during modprobe -r amdgpuKenneth Feng
Send unload command to smu during modprobe -r amdgpu for smu 13/14. 1. This can fix the high voltage/temperatue issue after driver is unloaded. 2. Reloading driver could fail but with the debug port based mode1 reset during driver is reloaded, it is good and safe. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/pm: use debug port for mode1 reset request on smu 13&14Kenneth Feng
use debug port for mode1 reset request so fw can handle mode1 reset even when it is stuck. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: Reject cursor plane on DCE when scaled differently than primaryTimur Kristóf
Currently DCE doesn't support the overlay cursor, so the dm_crtc_get_cursor_mode() function returns DM_CURSOR_NATIVE_MODE unconditionally. The outcome is that it doesn't check for the conditions that would necessitate the overlay cursor, meaning that it doesn't reject cases where the native cursor mode isn't supported on DCE. Remove the early return from dm_crtc_get_cursor_mode() for DCE and instead let it perform the necessary checks and return DM_CURSOR_OVERLAY_MODE. Add a later check that rejects when DM_CURSOR_OVERLAY_MODE would be used with DCE. Fixes: 1b04dcca4fb1 ("drm/amd/display: Introduce overlay cursor mode") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4600 Suggested-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/pm: use sysfs_streq for string matching in amdgpu_pmYang Wang
The driver uses strncmp() to compare sysfs attribute strings, which does not handle trailing newlines and lacks NULL safety. sysfs_streq() is the recommended function for sysfs string equality checks in the kernel, providing safer and more correct behavior. replace strncmp() with sysfs_streq() in drivers/gpu/drm/amd/pm/amdgpu_pm.c Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amdkfd: Fix watch_id bounds checking in debug address watch v2Srinivasan Shanmugam
The address watch clear code receives watch_id as an unsigned value (u32), but some helper functions were using a signed int and checked bits by shifting with watch_id. If a very large watch_id is passed from userspace, it can be converted to a negative value. This can cause invalid shifts and may access memory outside the watch_points array. drm/amdkfd: Fix watch_id bounds checking in debug address watch v2 Fix this by checking that watch_id is within MAX_WATCH_ADDRESSES before using it. Also use BIT(watch_id) to test and clear bits safely. This keeps the behavior unchanged for valid watch IDs and avoids undefined behavior for invalid ones. Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:448 kfd_dbg_trap_clear_dev_address_watch() error: buffer overflow 'pdd->watch_points' 4 <= u32max user_rl='0-3,2147483648-u32max' uncapped drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c 433 int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd, 434 uint32_t watch_id) 435 { 436 int r; 437 438 if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id)) kfd_dbg_owns_dev_watch_id() doesn't check for negative values so if watch_id is larger than INT_MAX it leads to a buffer overflow. (Negative shifts are undefined). 439 return -EINVAL; 440 441 if (!pdd->dev->kfd->shared_resources.enable_mes) { 442 r = debug_lock_and_unmap(pdd->dev->dqm); 443 if (r) 444 return r; 445 } 446 447 amdgpu_gfx_off_ctrl(pdd->dev->adev, false); --> 448 pdd->watch_points[watch_id] = pdd->dev->kfd2kgd->clear_address_watch( 449 pdd->dev->adev, 450 watch_id); v2: (as per, Jonathan Kim) - Add early watch_id >= MAX_WATCH_ADDRESSES validation in the set path to match the clear path. - Drop the redundant bounds check in kfd_dbg_owns_dev_watch_id(). Fixes: e0f85f4690d0 ("drm/amdkfd: add debug set and clear address watch points operation") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jonathan Kim <jonathan.kim@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amdgpu: Fix missing unwind in amdgpu_ib_schedule() error pathSrinivasan Shanmugam
amdgpu_ib_schedule() returns early after calling amdgpu_ring_undo(). This skips the common free_fence cleanup path. Other error paths were already changed to use goto free_fence, but this one was missed. Change the early return to goto free_fence so all error paths clean up the same way. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c:232 amdgpu_ib_schedule() warn: missing unwind goto? drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 124 int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned int num_ibs, 125 struct amdgpu_ib *ibs, struct amdgpu_job *job, 126 struct dma_fence **f) 127 { ... 224 225 if (ring->funcs->insert_start) 226 ring->funcs->insert_start(ring); 227 228 if (job) { 229 r = amdgpu_vm_flush(ring, job, need_pipe_sync); 230 if (r) { 231 amdgpu_ring_undo(ring); --> 232 return r; The patch changed the other error paths to goto free_fence but this one was accidentally skipped. 233 } 234 } 235 236 amdgpu_ring_ib_begin(ring); ... 338 339 free_fence: 340 if (!job) 341 kfree(af); 342 return r; 343 } Fixes: f903b85ed0f1 ("drm/amdgpu: fix possible fence leaks from job structure") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: Fix dc_link NULL handling in HPD initSrinivasan Shanmugam
amdgpu_dm_hpd_init() may see connectors without a valid dc_link. The code already checks dc_link for the polling decision, but later unconditionally dereferences it when setting up HPD interrupts. Assign dc_link early and skip connectors where it is NULL. Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_irq.c:940 amdgpu_dm_hpd_init() error: we previously assumed 'dc_link' could be null (see line 931) drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_irq.c 923 /* 924 * Analog connectors may be hot-plugged unlike other connector 925 * types that don't support HPD. Only poll analog connectors. 926 */ 927 use_polling |= 928 amdgpu_dm_connector->dc_link && ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The patch adds this NULL check but hopefully it can be removed 929 dc_connector_supports_analog(amdgpu_dm_connector->dc_link->link_id.id); 930 931 dc_link = amdgpu_dm_connector->dc_link; dc_link assigned here. 932 933 /* 934 * Get a base driver irq reference for hpd ints for the lifetime 935 * of dm. Note that only hpd interrupt types are registered with 936 * base driver; hpd_rx types aren't. IOW, amdgpu_irq_get/put on 937 * hpd_rx isn't available. DM currently controls hpd_rx 938 * explicitly with dc_interrupt_set() 939 */ --> 940 if (dc_link->irq_source_hpd != DC_IRQ_SOURCE_INVALID) { ^^^^^^^^^^^^^^^^^^^^^^^ If it's NULL then we are trouble because we dereference it here. 941 irq_type = dc_link->irq_source_hpd - DC_IRQ_SOURCE_HPD1; 942 /* 943 * TODO: There's a mismatch between mode_info.num_hpd 944 * and what bios reports as the # of connectors with hpd Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)") Cc: Timur Kristóf <timur.kristof@gmail.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Mario Limonciello <superm1@kernel.org> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com> Cc: Roman Li <roman.li@amd.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: Promote DC to 3.2.369Taimur Hassan
This version brings along following update: -Fix system resume lag issue -Correct hubp GfxVersion verification -Add parse all extension blocks for VSDB -Increase DCN35 SR enter/exit latency -Refactor virtual directory reorganize encoder and hwss files -Set enable_legacy_fast_update to false for DCN36 -Have dm_atomic_state context aligned with dc_state current -Avoid updating surface with the same surface under MPO Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: [FW Promotion] Release 0.1.46.0Taimur Hassan
Add some struct member and enum for panel replay Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: Fix the incorrect type in dml_printAlex Hung
[Why & How] soc->max_outstanding_reqs is a dml_uint_t, not a dml_float_t. Reviewed-by: Austin Zheng <austin.zheng@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: bypass post csc for additional color spaces in dalClay King
[Why] For RGB BT2020 full and limited color spaces, overlay adjustments were applied twice (once by MM and once by DAL). This results in incorrect colours and a noticeable difference between mpo and non-mpo cases. [How] Add RGB BT2020 full and limited color spaces to list that bypasses post csc adjustment. Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Clay King <clayking@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: Revert "Migrate DCCG register access from hwseq to dccg ↵Nicholas Carbones
component." [Why & How] This reverts commit 949adb4789fe3c24eea01d9c2efe94ab92694a0d, which causes regressions related to HDCP when resuming from S3. Reviewed-by: Joshua Aberback <joshua.aberback@amd.com> Signed-off-by: Nicholas Carbones <ncarbone@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: Correct hubp GfxVersion verificationNicholas Carbones
[Why] DcGfxBase case was not accounted for in hubp program tiling functions, causing tiling corruption on PNP. [How] Add handling for DcGfxBase so that tiling gets properly cleared. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Nicholas Carbones <ncarbone@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysRevert "drm/amd/display: mouse event trigger to boost RR when idle"Muaaz Nisar
This reverts commit ba448f9ed62cf5a89603a738e6de91fc6c42ab35. It cause some regression. Reviewed-by: Sreeja Golui <sreeja.golui@amd.com> Signed-off-by: Muaaz Nisar <muanisar@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: Parse all extension blocks for VSDBRay Wu
[Why] VSDB parsing loop only searched within the first extension block. If the VSDB was located in a subsequent extension block, it would not be found. [How] Calculate the total length of all extension blocks (EDID_LENGTH * edid->extensions) and use that as the loop boundary, allowing the parser to search through all available extension blocks. Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysMerge tag 'mm-nonmm-stable-2026-02-12-10-48' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves disk space by teaching ocfs2 to reclaim suballocator block group space (Heming Zhao) - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the ARRAY_END() macro and uses it in various places (Alejandro Colomar) - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size (Pnina Feder) - "kallsyms: Prevent invalid access when showing module buildid" cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces (Petr Mladek) - "Address page fault in ima_restore_measurement_list()" fixes a kexec-related crash that can occur when booting the second-stage kernel on x86 (Harshit Mogalapalli) - "kho: ABI headers and Documentation updates" updates the kexec handover ABI documentation (Mike Rapoport) - "Align atomic storage" adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain) - "kho: clean up page initialization logic" simplifies the page initialization logic in kho_restore_page() (Pratyush Yadav) - "Unload linux/kernel.h" moves several things out of kernel.h and into more appropriate places (Yury Norov) - "don't abuse task_struct.group_leader" removes the usage of ->group_leader when it is "obviously unnecessary" (Oleg Nesterov) - "list private v2 & luo flb" adds some infrastructure improvements to the live update orchestrator (Pasha Tatashin) * tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits) watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency procfs: fix missing RCU protection when reading real_parent in do_task_stat() watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() kcsan, compiler_types: avoid duplicate type issues in BPF Type Format kho: fix doc for kho_restore_pages() tests/liveupdate: add in-kernel liveupdate test liveupdate: luo_flb: introduce File-Lifecycle-Bound global state liveupdate: luo_file: Use private list list: add kunit test for private list primitives list: add primitives for private list manipulations delayacct: fix uapi timespec64 definition panic: add panic_force_cpu= parameter to redirect panic to a specific CPU netclassid: use thread_group_leader(p) in update_classid_task() RDMA/umem: don't abuse current->group_leader drm/pan*: don't abuse current->group_leader drm/amd: kill the outdated "Only the pthreads threading model is supported" checks drm/amdgpu: don't abuse current->group_leader android/binder: use same_thread_group(proc->tsk, current) in binder_mmap() android/binder: don't abuse current->group_leader kho: skip memoryless NUMA nodes when reserving scratch areas ...
11 daysdrm/amd/display: Make GPIO HPD path conditionalRoman Li
[Why] Avoid unnecessary GPIO configuration attempts on dcn that doesn't support it. [How] Conditionally use GPIO HPD detection or rely on hw encoder path. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: Increase DCN35 SR enter/exit latencyLeo Li
[Why & How] On Framework laptops with DDR5 modules, underflow can be observed. It's unclear why it only occurs on specific desktop contents. However, increasing enter/exit latencies by 3us seems to resolve it. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4463 Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
11 daysdrm/amd/display: guard NULL manual-trigger callback in cursor programmingVitaly Prosyak
KASAN reports a NULL instruction fetch (RIP=0x0) from dc_stream_program_cursor_position(): BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:0x0 Call Trace: dc_stream_program_cursor_position+0x344/0x920 [amdgpu] amdgpu_dm_atomic_commit_tail+... [ +1.041013] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ +0.000027] #PF: supervisor instruction fetch in kernel mode [ +0.000013] #PF: error_code(0x0010) - not-present page [ +0.000012] PGD 0 P4D 0 [ +0.000017] Oops: Oops: 0010 [#1] SMP KASAN NOPTI [ +0.000017] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G E 6.18.0+ #3 PREEMPT(voluntary) [ +0.000023] Tainted: [E]=UNSIGNED_MODULE [ +0.000010] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000016] Workqueue: events drm_mode_rmfb_work_fn [ +0.000022] RIP: 0010:0x0 [ +0.000017] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ +0.000015] RSP: 0018:ffffc9000017f4c8 EFLAGS: 00010246 [ +0.000016] RAX: 0000000000000000 RBX: ffff88810afdda80 RCX: 1ffff110457000d1 [ +0.000014] RDX: 1ffffffff87b75bd RSI: 0000000000000000 RDI: ffff88810afdda80 [ +0.000014] RBP: ffffc9000017f538 R08: 0000000000000000 R09: ffff88822b800690 [ +0.000013] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc3dbac20 [ +0.000014] R13: 0000000000000000 R14: ffff88811ab80000 R15: dffffc0000000000 [ +0.000014] FS: 0000000000000000(0000) GS:ffff888434599000(0000) knlGS:0000000000000000 [ +0.000015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000013] CR2: ffffffffffffffd6 CR3: 000000010ee88000 CR4: 0000000000350ef0 [ +0.000014] Call Trace: [ +0.000010] <TASK> [ +0.000010] dc_stream_program_cursor_position+0x344/0x920 [amdgpu] [ +0.001086] ? __pfx_mutex_lock+0x10/0x10 [ +0.000015] ? unwind_next_frame+0x18b/0xa70 [ +0.000019] amdgpu_dm_atomic_commit_tail+0x1124/0xfa20 [amdgpu] [ +0.001040] ? ret_from_fork_asm+0x1a/0x30 [ +0.000018] ? filter_irq_stacks+0x90/0xa0 [ +0.000022] ? __pfx_amdgpu_dm_atomic_commit_tail+0x10/0x10 [amdgpu] [ +0.001058] ? kasan_save_track+0x18/0x70 [ +0.000015] ? kasan_save_alloc_info+0x37/0x60 [ +0.000015] ? __kasan_kmalloc+0xc3/0xd0 [ +0.000013] ? __kmalloc_cache_noprof+0x1aa/0x600 [ +0.000016] ? drm_atomic_helper_setup_commit+0x788/0x1450 [ +0.000017] ? drm_atomic_helper_commit+0x7e/0x290 [ +0.000014] ? drm_atomic_commit+0x205/0x2e0 [ +0.000015] ? process_one_work+0x629/0xf80 [ +0.000016] ? worker_thread+0x87f/0x1570 [ +0.000020] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? __kasan_check_write+0x14/0x30 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? _raw_spin_lock_irq+0x8a/0xf0 [ +0.000015] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ +0.000016] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __wait_for_common+0x204/0x460 [ +0.000015] ? sched_clock_noinstr+0x9/0x10 [ +0.000014] ? __pfx_schedule_timeout+0x10/0x10 [ +0.000014] ? local_clock_noinstr+0xe/0xd0 [ +0.000015] ? __pfx___wait_for_common+0x10/0x10 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __wait_for_common+0x204/0x460 [ +0.000014] ? __pfx_schedule_timeout+0x10/0x10 [ +0.000015] ? __kasan_kmalloc+0xc3/0xd0 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? wait_for_completion_timeout+0x1d/0x30 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? drm_crtc_commit_wait+0x32/0x180 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? drm_atomic_helper_wait_for_dependencies+0x46a/0x800 [ +0.000019] commit_tail+0x231/0x510 [ +0.000017] drm_atomic_helper_commit+0x219/0x290 [ +0.000015] ? __pfx_drm_atomic_helper_commit+0x10/0x10 [ +0.000016] drm_atomic_commit+0x205/0x2e0 [ +0.000014] ? __pfx_drm_atomic_commit+0x10/0x10 [ +0.000013] ? __pfx_drm_connector_free+0x10/0x10 [ +0.000014] ? __pfx___drm_printfn_info+0x10/0x10 [ +0.000017] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? drm_atomic_set_crtc_for_connector+0x49e/0x660 [ +0.000015] ? drm_atomic_set_fb_for_plane+0x155/0x290 [ +0.000015] drm_framebuffer_remove+0xa9b/0x1240 [ +0.000014] ? finish_task_switch.isra.0+0x15a/0x840 [ +0.000015] ? __switch_to+0x385/0xda0 [ +0.000015] ? srso_safe_ret+0x1/0x20 [ +0.000013] ? __pfx_drm_framebuffer_remove+0x10/0x10 [ +0.000016] ? kasan_print_address_stack_frame+0x221/0x280 [ +0.000015] drm_mode_rmfb_work_fn+0x14b/0x240 [ +0.000015] process_one_work+0x629/0xf80 [ +0.000012] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000019] worker_thread+0x87f/0x1570 [ +0.000013] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.000014] ? __pfx_try_to_wake_up+0x10/0x10 [ +0.000017] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? kasan_print_address_stack_frame+0x227/0x280 [ +0.000017] ? __pfx_worker_thread+0x10/0x10 [ +0.000014] kthread+0x396/0x830 [ +0.000013] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ +0.000015] ? __pfx_kthread+0x10/0x10 [ +0.000012] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? recalc_sigpending+0x180/0x210 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __pfx_kthread+0x10/0x10 [ +0.000014] ret_from_fork+0x31c/0x3e0 [ +0.000014] ? __pfx_kthread+0x10/0x10 [ +0.000013] ret_from_fork_asm+0x1a/0x30 [ +0.000019] </TASK> [ +0.000010] Modules linked in: rfcomm(E) cmac(E) algif_hash(E) algif_skcipher(E) af_alg(E) snd_seq_dummy(E) snd_hrtimer(E) qrtr(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xt_mark(E) xt_tcpudp(E) nft_compat(E) nf_tables(E) x_tables(E) bnep(E) snd_hda_codec_alc882(E) snd_hda_codec_atihdmi(E) snd_hda_codec_realtek_lib(E) snd_hda_codec_hdmi(E) snd_hda_codec_generic(E) iwlmvm(E) snd_hda_intel(E) binfmt_misc(E) snd_hda_codec(E) snd_hda_core(E) mac80211(E) snd_intel_dspcfg(E) snd_intel_sdw_acpi(E) snd_hwdep(E) snd_pcm(E) libarc4(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_rawmidi(E) amd_atl(E) intel_rapl_msr(E) snd_seq(E) intel_rapl_common(E) iwlwifi(E) jc42(E) snd_seq_device(E) btusb(E) snd_timer(E) btmtk(E) btrtl(E) edac_mce_amd(E) eeepc_wmi(E) polyval_clmulni(E) btbcm(E) ghash_clmulni_intel(E) asus_wmi(E) ee1004(E) platform_profile(E) btintel(E) snd(E) nls_iso8859_1(E) aesni_intel(E) soundcore(E) i2c_piix4(E) cfg80211(E) sparse_keymap(E) wmi_bmof(E) bluetooth(E) k10temp(E) rapl(E) [ +0.000300] i2c_smbus(E) ccp(E) joydev(E) input_leds(E) gpio_amdpt(E) mac_hid(E) sch_fq_codel(E) msr(E) parport_pc(E) ppdev(E) lp(E) parport(E) efi_pstore(E) nfnetlink(E) dmi_sysfs(E) autofs4(E) cdc_ether(E) usbnet(E) amdgpu(E) amdxcp(E) hid_generic(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_exec(E) drm_panel_backlight_quirks(E) gpu_sched(E) drm_suballoc_helper(E) video(E) drm_buddy(E) usbhid(E) drm_display_helper(E) r8152(E) hid(E) mii(E) cec(E) ahci(E) rc_core(E) igc(E) libahci(E) wmi(E) [ +0.000294] CR2: 0000000000000000 [ +0.000013] ---[ end trace 0000000000000000 ]--- The crash happens when we unconditionally call into the timing generator manual trigger hook: pipe_ctx->stream_res.tg->funcs->program_manual_trigger(...) On some configurations the timing generator (tg), its funcs table, or the program_manual_trigger callback can be NULL. Guard all of these before calling the hook. If the first pipe matching the stream cannot trigger, keep scanning to find another matching pipe with a valid hook. The issue was originally found on Vg20/DCE 12.1 Mario successfully tested on Polaris 11/DCE 11.2 Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Alexander Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Fixes: ba448f9ed62c ("drm/amd/display: mouse event trigger to boost RR when idle") Suggested-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: use enum value for panel replay settingPeichen Huang
[WHY & HOW] use enum value for Panel Replay setting. Reviewed-by: Robin Chen <robin.chen@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: Refactor virtual directory reorganize encoder and hwss files.Bhuvanachandra Pinninti
[why] Virtual encoders & hwss were grouped in a separate directory, not aligned with dio and link component structure. [how] Moved virtual_link_encoder and virtual_stream_encoder to dc/dio/virtual/. Moved virtual_link_hwss to dc/link/hwss/ and renamed to link_hwss_virtual. Removed dc/virtual/ directory. Updated all includes and build files (Makefiles) Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Bhuvanachandra Pinninti <bpinnint@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: set enable_legacy_fast_update to false for DCN36YiLing Chen
[Why/How] Align the default value of the flag with DCN35/351. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: YiLing Chen <yi-lchen@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: Check frame skip capability in Sink sideLeon Huang
[Why&How] Frame skip capability is described in AMD VSDB in EDID. Need to retrieve the cap and determine fr.skipping mode enablement Reviewed-by: ChunTao Tso <chuntao.tso@amd.com> Signed-off-by: Leon Huang <Leon.Huang1@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: Avoid updating surface with the same surface under MPOWayne Lin
[Why & How] Although it's dummy updates of surface update for committing stream updates, we should not have dummy_updates[j].surface all indicating to the same surface under multiple surfaces case. Otherwise, copy_surface_update_to_plane() in update_planes_and_stream_state() will update to the same surface only. Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/display: Fix system resume lag issueTom Chung
[Why] System will try to apply idle power optimizations setting during system resume. But system power state is still in D3 state, and it will cause the idle power optimizations command not actually to be sent to DMUB and cause some platforms to go into IPS. [How] Set power state to D0 first before calling the dc_dmub_srv_apply_idle_power_optimizations(dm->dc, false) Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/pm: Use U64 for accumulation counterAsad Kamal
Use U64 for accumulation counter in gpu metrics for smu_v13_0_6 and smu_v13_0_12 Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 daysdrm/amd/pm: Add acc counter & fw timestamp to xcp metricsAsad Kamal
Add accumulation counter and firmware timestamp to partition metrics for smu_v13_0_6 & smu_v13_0_12 v2: Use U64 for accumulation counter (Lijo) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
12 daysMerge tag 'drm-next-2026-02-11' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "Highlights: - amdgpu support for lots of new IP blocks which means newer GPUs - xe has a lot of SR-IOV and SVM improvements - lots of intel display refactoring across i915/xe - msm has more support for gen8 platforms - Given up on kgdb/kms integration, it's too hard on modern hw core: - drop kgdb support - replace system workqueue with percpu - account for property blobs in memcg - MAINTAINERS updates for xe + buddy rust: - Fix documentation for Registration constructors - Use pin_init::zeroed() for fops initialization - Annotate DRM helpers with __rust_helper - Improve safety documentation for gem::Object::new() - Update AlwaysRefCounted imports - mm: Prevent integer overflow in page_align() atomic: - add drm_device pointer to drm_private_obj - introduce gamma/degamma LUT size check buddy: - fix free_trees memory leak - prevent BUG_ON bridge: - introduce drm_bridge_unplug/enter/exit - add connector argument to .hpd_notify - lots of recounting conversions - convert rockchip inno hdmi to bridge - lontium-lt9611uxc: switch to HDMI audio helpers - dw-hdmi-qp: add support for HPD-less setups - Algoltek AG6311 support panels: - edp: CSW MNE007QB3-1, AUO B140HAN06.4, AUO B140QAX01.H - st75751: add SPI support - Sitronix ST7920, Samsung LTL106HL02 - LG LH546WF1-ED01, HannStar HSD156J - BOE NV130WUM-T08 - Innolux G150XGE-L05 - Anbernic RG-DS dma-buf: - improve sg_table debugging - add tracepoints - call clear_page instead of memset - start to introduce cgroup memory accounting in heaps - remove sysfs stats dma-fence: - add new helpers dp: - mst: avoid oob access with vcpi=0 hdmi: - limit infoframes exposure to userspace gem: - reduce page table overhead with THP - fix leak in drm_gem_get_unmapped_area gpuvm: - API sanitation for rust bindings sched: - introduce new helpers panic: - report invalid panic modes - add kunit tests i915/xe display: - Expose sharpness only if num_scalers is >= 2 - Add initial Xe3P_LPD for NVL - BMG FBC support - Add MTL+ platforms to support dpll framework _ fix DIMM_S DRM decoding on ICL - Return to using AUX interrupts - PSR/Panel replay refactoring - use consolidation HDMI tables - Xe3_LPD CD2X dividier changes xe: - vfio: add vfio_pci for intel GPU - multi queue support - dynamic pagemaps and multi-device SVM - expose temp attribs in hwmon - NO_COMPRESSION bo flag - expose MERT OA unit - sysfs survivability refactor - SRIOV PF: add MERT support - enable SR-IOV VF migration - Enable I2C/NVM on Crescent Island - Xe3p page reclaimation support - introduce SRIOV scheduler groups - add SoC remappt support in system controller - insert compiler barriers in GuC code - define NVL GuC firmware - handle GT resume failure - fix drm scheduler layering violations - enable GSC loading and PXP for PTL - disable GuC Power DCC strategy on PTL - unregister drm device on probe error i915: - move to kernel standard fault injection - bump recommended GuC version for DG2 and MTL amdgpu: - SMUIO 15.x, PSP 15.x support - IH 6.1.1/7.1 support - MMHUB 3.4/4.2 support - GC 11.5.4/12.1 support - SDMA 6.1.4/7.1/7.11.4 support - JPEG 5.3 support - UserQ updates - GC 9 gfx queue reset support - TTM memory ops parallelization - convert legacy logging to new helpers - DC analog fixes amdkfd: - GC 11.5.4/12.1 suppport - SDMA 6.1.4/7.1 support - per context support - increase kfd process hash table - Reserved SDMA rework radeon: - convert legacy logging to new helpers - use devm for i2c adapters msm: - GPU - Document a612/RGMU dt bindings - UBWC 6.0 support (for A840 / Kaanapali) - a225 support - DPU: - Switch to use virtual planes by default - Fix DSI CMD panels on DPU 3.x - Rewrite format handling to remove intermediate representation - Fix watchdog on DPU 8.x+ - Fix TE / Vsync source setting on DPU 8.x+ - Add 3D_Mux on SC7280 - Kaanapali platform support - Fix UBWC register programming - Make RM reserve DSPP-enabled mixers for CRTCs with LMs - Gamma correction support - DP: - Enable support for eDP 1.4+ link rate tables - Fix MDSS1 DP indices on SA8775P, making them to work - Fix msm_dp_ctrl_config_msa() to work with LLVM 20 - DSI: - Document QCS8300 as compatible with SA8775P - Kaanapali platform support - DSI PHY: - switch to divider_determine_rate() - MDP5: - Drop support for MSM8998, SDM660 and SDM630 (switch over to DPU) - MDSS: - Kaanapali platform support - Fixed UBWC register programming nova-core: - Prepare for Turing support. This includes parsing and handling Turing-specific firmware headers and sections as well as a Turing Falcon HAL implementation - Get rid of the Result<impl PinInit<T, E>> anti-pattern - Relocate initializer-specific code into the appropriate initializer - Use CStr::from_bytes_until_nul() to remove custom helpers - Improve handling of unexpected firmware values - Clean up redundant debug prints - Replace c_str!() with native Rust C-string literals - Update nova-core task list nova: - Align GEM object size to system page size tyr: - Use generated uAPI bindings for GpuInfo - Replace manual sleeps with read_poll_timeout() - Replace c_str!() with native Rust C-string literals - Suppress warnings for unread fields - Fix incorrect register name in print statement nouveau: - fix big page table support races in PTE management - improve reclocking on tegra 186+ amdxdna: - fix suspend race conditions - improve handling of zero tail pointers - fix cu_idx overwritten during command setup - enable hardware context priority - remove NPU2 support - update message buffer allocation requirements - update firmware version check ast: - support imported cursor buffers - big endian fixes etnaviv: - add PPU flop reset support imagination: - add AM62P support - introduce hw version checks ivpu: - implement warm boot flow panfrost: - add bo sync ioctl - add GPU_PM_RT support for RZ/G3E SoC panthor: - add bo sync ioctl - enable timestamp propagation - scheduler robustness improvements - VM termination fixes - huge page support rockchip: - RK3368 HDMI Support - get rid of atomic_check fixups - RK3506 support - RK3576/RK3588 improved HPD handling rz-du: - RZ/V2H(P) MIPI-DSI Support v3d: - fix DMA segment size - convert to new logging helpers mediatek: - move DP training to hotplug thread - convert logging to new helpers - add support for HS speed DSI - Genio 510/700/1200-EVK, Radxa NIO-12L HDMI support atmel-hlcdc: - switch to drmm resource - support nomodeset - use newer helpers hisilicon: - fix various DP bugs renesas: - fix kernel panic on reboot exynos: - fix vidi_connection_ioctl using wrong device - fix vidi_connection deref user ptr - fix concurrency regression with vidi_context vkms: - add configfs support for display configuration * tag 'drm-next-2026-02-11' of https://gitlab.freedesktop.org/drm/kernel: (1610 commits) drm/xe/pm: Disable D3Cold for BMG only on specific platforms drm/xe: Fix kerneldoc for xe_tlb_inval_job_alloc_dep drm/xe: Fix kerneldoc for xe_gt_tlb_inval_init_early drm/xe: Fix kerneldoc for xe_migrate_exec_queue drm/xe/query: Fix topology query pointer advance drm/xe/guc: Fix kernel-doc warning in GuC scheduler ABI header drm/xe/guc: Fix CFI violation in debugfs access. accel/amdxdna: Move RPM resume into job run function accel/amdxdna: Fix incorrect DPM level after suspend/resume nouveau/vmm: start tracking if the LPT PTE is valid. (v6) nouveau/vmm: increase size of vmm pte tracker struct to u32 (v2) nouveau/vmm: rewrite pte tracker using a struct and bitfields. accel/amdxdna: Fix incorrect error code returned for failed chain command accel/amdxdna: Remove hardware context status drm/bridge: imx8qxp-pixel-combiner: Fix bailout for imx8qxp_pc_bridge_probe() drm/panel: ilitek-ili9882t: Remove duplicate initializers in tianma_il79900a_dsc drm/i915/display: fix the pixel normalization handling for xe3p_lpd drm/exynos: vidi: use ctx->lock to protect struct vidi_context member variables related to memory alloc/free drm/exynos: vidi: fix to avoid directly dereferencing user pointer drm/exynos: vidi: use priv->vidi_dev for ctx lookup in vidi_connection_ioctl() ...
2026-02-05drm/amdgpu: Send applicable RMA CPERs at end of RAS initKent Russell
Firmware and monitoring tools may not be ready to receive a CPER when we read the bad pages, so send the CPERs at the end of RAS initialization to ensure that the FW is ready to receive and process the CPER. This removes the previous CPER submission that was added during bad page load, and sends both in-band and out-of-band at the same time. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-05drm/amd: Fix hang on amdgpu unload by using pci_dev_is_disconnected()Mario Limonciello
The commit 6a23e7b4332c ("drm/amd: Clean up kfd node on surprise disconnect") introduced early KFD cleanup when drm_dev_is_unplugged() returns true. However, this causes hangs during normal module unload (rmmod amdgpu). The issue occurs because drm_dev_unplug() is called in amdgpu_pci_remove() for all removal scenarios, not just surprise disconnects. This was done intentionally in commit 39934d3ed572 ("Revert "drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled"") to fix IGT PCI software unplug test failures. As a result, drm_dev_is_unplugged() returns true even during normal module unload, triggering the early KFD cleanup inappropriately. The correct check should distinguish between: - Actual surprise disconnect (eGPU unplugged): pci_dev_is_disconnected() returns true - Normal module unload (rmmod): pci_dev_is_disconnected() returns false Replace drm_dev_is_unplugged() with pci_dev_is_disconnected() to ensure the early cleanup only happens during true hardware disconnect events. Cc: stable@vger.kernel.org Reported-by: Cal Peake <cp@absolutedigital.net> Closes: https://lore.kernel.org/all/b0c22deb-c0fa-3343-33cf-fd9a77d7db99@absolutedigital.net/ Fixes: 6a23e7b4332c ("drm/amd: Clean up kfd node on surprise disconnect") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-05drm/amdgpu: re-add the bad job to the pending list for ring resetsAlex Deucher
Returning DRM_GPU_SCHED_STAT_NO_HANG causes the scheduler to add the bad job back the pending list. We've already set the errors on the fence and killed the bad job at this point so it's the correct behavior. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-05drm/amdgpu: avoid sdma ring reset in sriovVictor Zhao
sdma ring reset is not supported in SRIOV. kfd driver does not check reset mask, and could queue sdma ring reset during unmap_queues_cpsch. Avoid the ring reset for sriov. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-05drm/amdgpu: clean up the amdgpu_cs_parser_bosSunil Khatri
In low memory conditions, kmalloc can fail. In such conditions unlock the mutex for a clean exit. We do not need to amdgpu_bo_list_put as it's been handled in the amdgpu_cs_parser_fini. Fixes: 737da5363cc0 ("drm/amdgpu: update the functions to use amdgpu version of hmm") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202602030017.7E0xShmH-lkp@intel.com/ Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-05drm/amdgpu: Use 5-level paging if gmc support 57-bit VAPhilip Yang
Regardless if CPU enable 5-level paging, GPU vm use 5-level paging if gmc init with 57-bit address space support, because ARM64 4-level paging support 48-bit VA, x86 and GPU 4-level paging support 47-bit VA, require 5-level paging on GPU to support ARM64. NPA address space 52-bit mapping on NPA GPU VM require 5-level paging. Debugger trap get device snapshot expect LDS and Scratch base, limit above 57-bit, which is set only for 5-level paging. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.19.x
2026-02-05drm/amd/display: expose plane blend LUT in HW with MCMMelissa Wen
Since commit 39923050615cd ("drm/amd/display: Clear DPP 3DLUT Cap") there is a flag in the mpc_color_caps that indicates the pre-blend usage of MPC color caps. Do the same as commit 9e5d4a5e27c6 ("drm/amd/display: Use mpc.preblend flag to indicate preblend") and use the mpc.preblend flag to expose plane blend LUT/TF properties on AMD display driver. CC: Matthew Schwartz <matthew.schwartz@linux.dev> Signed-off-by: Melissa Wen <mwen@igalia.com> Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-05drm/amdgpu: Protect GPU register accesses in powergated state in some pathsYifan Zhang
Ungate GPU CG/PG in device_fini_hw and device_halt to protect GPU register accesses, e.g. GC registers are accessed in amdgpu_irq_disable_all() and amdgpu_fence_driver_hw_fini(). Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2026-02-05drm/amdkfd: Fix out-of-bounds write in kfd_event_page_set()Sunday Clement
The kfd_event_page_set() function writes KFD_SIGNAL_EVENT_LIMIT * 8 bytes via memset without checking the buffer size parameter. This allows unprivileged userspace to trigger an out-of bounds kernel memory write by passing a small buffer, leading to potential privilege escalation. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2026-02-05drm/amdgpu/sdma6: enable queue resets unconditionallyAlex Deucher
There is no firmware version dependency. This also enables sdma queue resets on all SDMA 6.x based chips. Fixes: 59fd50b8663b ("drm/amdgpu: Add sysfs interface for sdma reset mask") Cc: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-05drm/amdgpu/sdma5.2: enable queue resets unconditionallyAlex Deucher
There is no firmware version dependency. This also enables sdma queue resets on all SDMA 5.2.x based chips. Fixes: 59fd50b8663b ("drm/amdgpu: Add sysfs interface for sdma reset mask") Cc: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-05drm/amdgpu/sdma5: enable queue resets unconditionallyAlex Deucher
There is no firmware version dependency. Fixes: 59fd50b8663b ("drm/amdgpu: Add sysfs interface for sdma reset mask") Cc: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-05drm/amdgpu: Fix memory leak in amdgpu_ras_init()Zilin Guan
When amdgpu_nbio_ras_sw_init() fails in amdgpu_ras_init(), the function returns directly without freeing the allocated con structure, leading to a memory leak. Fix this by jumping to the release_con label to properly clean up the allocated memory before returning the error code. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: fdc94d3a8c88 ("drm/amdgpu: Rework pcie_bif ras sw_init") Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-05drm/amdgpu: Use kvfree instead of kfree in amdgpu_gmc_get_nps_memranges()Zilin Guan
amdgpu_discovery_get_nps_info() internally allocates memory for ranges using kvcalloc(), which may use vmalloc() for large allocation. Using kfree() to release vmalloc memory will lead to a memory corruption. Use kvfree() to safely handle both kmalloc and vmalloc allocations. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: b194d21b9bcc ("drm/amdgpu: Use NPS ranges from discovery table") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-05drm/amdgpu: Fix memory leak in amdgpu_acpi_enumerate_xcc()Zilin Guan
In amdgpu_acpi_enumerate_xcc(), if amdgpu_acpi_dev_init() returns -ENOMEM, the function returns directly without releasing the allocated xcc_info, resulting in a memory leak. Fix this by ensuring that xcc_info is properly freed in the error paths. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: 4d5275ab0b18 ("drm/amdgpu: Add parsing of acpi xcc objects") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-05drm/amd/ras: statistic xgmi training error countStanley.Yang
Report xgmi training error uncorrectable error count. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>