summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2023-09-26i915/guc: Get runtime pm in busyness worker only if already activeUmesh Nerlige Ramappa
Ideally the busyness worker should take a gt pm wakeref because the worker only needs to be active while gt is awake. However, the gt_park path cancels the worker synchronously and this complicates the flow if the worker is also running at the same time. The cancel waits for the worker and when the worker releases the wakeref, that would call gt_park and would lead to a deadlock. The resolution is to take the global pm wakeref if runtime pm is already active. If not, we don't need to update the busyness stats as the stats would already be updated when the gt was parked. Note: - We do not requeue the worker if we cannot take a reference to runtime pm since intel_guc_busyness_unpark would requeue the worker in the resume path. - If the gt was parked longer than time taken for GT timestamp to roll over, we ignore those rollovers since we don't care about tracking the exact GT time. We only care about roll overs when the gt is active and running workloads. - There is a window of time between gt_park and runtime suspend, where the worker may run. This is acceptable since the worker will not find any new data to update busyness. v2: (Daniele) - Edit commit message and code comment - Use runtime pm in the worker - Put runtime pm after enabling the worker - Use Link tag and add Fixes tag v3: (Daniele) - Reword commit and comments and add details Link: https://gitlab.freedesktop.org/drm/intel/-/issues/7077 Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230925192117.2497058-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit e2f99b79d4c594cdf7ab449e338d4947f5ea8903) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-09-26drm/i915/gt: Fix reservation address in ggtt_reserve_guc_topJavier Pello
There is an assertion in ggtt_reserve_guc_top that the global GTT is of size at least GUC_GGTT_TOP, which is not the case on a 32-bit platform; see commit 562d55d991b39ce376c492df2f7890fd6a541ffc ("drm/i915/bdw: Only use 2g GGTT for 32b platforms"). If GEM_BUG_ON is enabled, this triggers a BUG(); if GEM_BUG_ON is disabled, the subsequent reservation fails and the driver fails to initialise the device: i915 0000:00:02.0: [drm:i915_init_ggtt [i915]] Failed to reserve top of GGTT for GuC i915 0000:00:02.0: Device initialization failed (-28) i915 0000:00:02.0: Please file a bug on drm/i915; see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details. i915: probe of 0000:00:02.0 failed with error -28 Make the reservation at the top of the available space, whatever that is, instead of assuming that the top will be GUC_GGTT_TOP. Fixes: 911800765ef6 ("drm/i915/uc: Reserve upper range of GGTT") Link: https://gitlab.freedesktop.org/drm/intel/-/issues/9080 Signed-off-by: Javier Pello <devel@otheo.eu> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Fernando Pacheco <fernando.pacheco@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: stable@vger.kernel.org # v5.3+ Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230902171039.2229126186d697dbcf62d6d8@otheo.eu (cherry picked from commit 0f3fa942d91165c2702577e9274d2ee1c7212afc) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-09-26i915: Limit the length of an sg list to the requested lengthMatthew Wilcox (Oracle)
The folio conversion changed the behaviour of shmem_sg_alloc_table() to put the entire length of the last folio into the sg list, even if the sg list should have been shorter. gen8_ggtt_insert_entries() relied on the list being the right length and would overrun the end of the page tables. Other functions may also have been affected. Clamp the length of the last entry in the sg list to be the expected length. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Fixes: 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a folio_batch") Cc: stable@vger.kernel.org # 6.5.x Link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256 Link: https://lore.kernel.org/lkml/6287208.lOV4Wx5bFT@natalenko.name/ Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230919194855.347582-1-willy@infradead.org (cherry picked from commit 26a8e32e6d77900819c0c730fbfb393692dbbeea) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-09-22Merge tag 'amd-drm-fixes-6.6-2023-09-20' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.6-2023-09-20: amdgpu: - MST fix - Vbios part number reporting fix - Fix a possible memory leak in an error case in the RAS code - Fix low resolution modes on eDP amdkfd: - Fix GPU address for user queue wptr when GART is not at 0 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230920222915.7789-1-alexander.deucher@amd.com
2023-09-22Merge tag 'drm-intel-fixes-2023-09-21' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Prevent error pointer dereference (Dan Carpenter) - Fix PMU busyness values when using GuC mode (Umesh) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZQxf267jxc7tiIlZ@intel.com
2023-09-22Merge tag 'drm-misc-fixes-2023-09-21' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: * DRM MM-test fixes * Fbdev Kconfig fixes * ivpu: * IRQ-handling fixes * meson: * Fix memory leak in HDMI EDID code * nouveau: * Correct type casting * Fix memory leak in scheduler * u_memcpya() fixes * virtio: * Fence cleanups Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230921153712.GA14059@linux-uq9g
2023-09-20drm/amdkfd: Use gpu_offset for user queue's wptrYuBiao Wang
Directly use tbo's start address will miss the domain start offset. Need to use gpu_offset instead. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-09-20drm/amd/display: fix the ability to use lower resolution modes on eDPHamza Mahfooz
On eDP we can receive invalid modes from dm_update_crtc_state() for entirely new streams for which drm_mode_set_crtcinfo() shouldn't be called on. So, instead of calling drm_mode_set_crtcinfo() from within create_stream_for_sink() we can instead call it from amdgpu_dm_connector_mode_valid(). Since, we are guaranteed to only call drm_mode_set_crtcinfo() for valid modes from that function (invalid modes are rejected by that callback) and that is the only user of create_validate_stream_for_sink() that we need to call drm_mode_set_crtcinfo() for (as before commit cb841d27b876 ("drm/amd/display: Always pass connector_state to stream validation"), that is the only place where create_validate_stream_for_sink()'s dm_state was NULL). Cc: stable@vger.kernel.org Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2693 Fixes: cb841d27b876 ("drm/amd/display: Always pass connector_state to stream validation") Tested-by: Mark Broadworth <mark.broadworth@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20drm/amdgpu: fix a memory leak in amdgpu_ras_feature_enableCong Liu
This patch fixes a memory leak in the amdgpu_ras_feature_enable() function. The leak occurs when the function sends a command to the firmware to enable or disable a RAS feature for a GFX block. If the command fails, the kfree() function is not called to free the info memory. Fixes: 9f051d6ff13f ("drm/amdgpu: Free ras cmd input buffer properly") Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Cong Liu <liucong2@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20Revert "drm/amdgpu: Report vbios version instead of PN"Lijo Lazar
This reverts commit 7748ce5b69581325cae40c2134088820f0957902. vbios_version sysfs node is used to identify Part Number also. Revert to the same so that it doesn't break scripts/software which parse this. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20drm/amd/display: Fix MST recognizes connected displays as oneMuhammad Ahmed
[What] MST now recognizes both connected displays Fixes: 927e784c180c ("drm/amd/display: Add symclk enable/disable during stream enable/disable") Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Stylon Wang <stylon.wang@amd.com> Signed-off-by: Muhammad Ahmed <ahmed.ahmed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-21drm/virtio: clean out_fence on complete_submitJosé Pekkarinen
The removed line prevents the following cleanup function to execute a dma_fence_put on the out_fence to free its memory, producing the following output in kmemleak: unreferenced object 0xffff888126d8ee00 (size 128): comm "kwin_wayland", pid 981, jiffies 4295380296 (age 390.060s) hex dump (first 32 bytes): c8 a1 c2 27 81 88 ff ff e0 14 a9 c0 ff ff ff ff ...'............ 30 1a e1 2e a6 00 00 00 28 fc 5b 17 81 88 ff ff 0.......(.[..... backtrace: [<0000000011655661>] kmalloc_trace+0x26/0xa0 [<0000000055f15b82>] virtio_gpu_fence_alloc+0x47/0xc0 [virtio_gpu] [<00000000fa6d96f9>] virtio_gpu_execbuffer_ioctl+0x1a8/0x800 [virtio_gpu] [<00000000e6cb5105>] drm_ioctl_kernel+0x169/0x240 [drm] [<000000005ad33e27>] drm_ioctl+0x399/0x6b0 [drm] [<00000000a19dbf65>] __x64_sys_ioctl+0xc5/0x100 [<0000000011fa801e>] do_syscall_64+0x5b/0xc0 [<0000000065c76d8a>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 unreferenced object 0xffff888121930500 (size 128): comm "kwin_wayland", pid 981, jiffies 4295380313 (age 390.096s) hex dump (first 32 bytes): c8 a1 c2 27 81 88 ff ff e0 14 a9 c0 ff ff ff ff ...'............ f9 ec d7 2f a6 00 00 00 28 fc 5b 17 81 88 ff ff .../....(.[..... backtrace: [<0000000011655661>] kmalloc_trace+0x26/0xa0 [<0000000055f15b82>] virtio_gpu_fence_alloc+0x47/0xc0 [virtio_gpu] [<00000000fa6d96f9>] virtio_gpu_execbuffer_ioctl+0x1a8/0x800 [virtio_gpu] [<00000000e6cb5105>] drm_ioctl_kernel+0x169/0x240 [drm] [<000000005ad33e27>] drm_ioctl+0x399/0x6b0 [drm] [<00000000a19dbf65>] __x64_sys_ioctl+0xc5/0x100 [<0000000011fa801e>] do_syscall_64+0x5b/0xc0 [<0000000065c76d8a>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [...] This memleak will grow quickly, being possible to see the following line in dmesg after few minutes of life in the virtual machine: [ 706.217388] kmemleak: 10731 new suspected memory leaks (see /sys/kernel/debug/kmemleak) The patch will remove the line to allow the cleanup function do its job. Signed-off-by: José Pekkarinen <jose.pekkarinen@foxhound.fi> Fixes: e4812ab8e6b1 ("drm/virtio: Refactor and optimize job submission code path") Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230912060824.5210-1-jose.pekkarinen@foxhound.fi
2023-09-20i915/pmu: Move execlist stats initialization to execlist specific setupUmesh Nerlige Ramappa
engine->stats is a union of execlist and guc stat objects. When execlist specific fields are initialized, the initial state of guc stats is affected. This results in bad busyness values when using GuC mode. Move the execlist initialization from common code to execlist specific code. Fixes: 77cdd054dd2c ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230912212247.1828681-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit 4485bd519f5d6d620a29d0547ff3c982bdeeb468) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-09-20drm/i915/gt: Prevent error pointer dereferenceDan Carpenter
Move the check for "if (IS_ERR(obj))" in front of the call to i915_gem_object_set_cache_coherency() which dereferences "obj". Otherwise it will lead to a crash. Fixes: 43aa755eae2c ("drm/i915/mtl: Update cache coherency setting for context structure") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/455b2279-2e08-4d00-9784-be56d8ee42e3@moroto.mountain (cherry picked from commit c92ec50822fb84306d951520d81919328421acbd) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-09-20drm/meson: fix memory leak on ->hpd_notify callbackJani Nikula
The EDID returned by drm_bridge_get_edid() needs to be freed. Fixes: 0af5e0b41110 ("drm/meson: encoder_hdmi: switch to bridge DRM_BRIDGE_ATTACH_NO_CONNECTOR") Cc: Neil Armstrong <narmstrong@baylibre.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Kevin Hilman <khilman@baylibre.com> Cc: Jerome Brunet <jbrunet@baylibre.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-amlogic@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org # v5.17+ Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230914131015.2472029-1-jani.nikula@intel.com
2023-09-20nouveau/u_memcpya: fix NULL vs error pointer bugDan Carpenter
The u_memcpya() function is supposed to return error pointers on error. Returning NULL will lead to an Oops. Fixes: e3885f712134 ("nouveau/u_memcpya: use vmemdup_user") Reviewed-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/10fd258b-466f-4c5b-9d48-fe61a3f21424@moroto.mountain
2023-09-20nouveau/u_memcpya: use vmemdup_userDave Airlie
I think there are limit checks in place for most things but the new uAPI wants to not have them. Add a limit check and use the vmemdup_user helper instead. Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Danilo Krummrich <dakr@redhat.com> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230810185020.231135-1-airlied@gmail.com
2023-09-20drm/nouveau: sched: fix leaking memory of timedout jobDanilo Krummrich
Always stop and re-start the scheduler in order to let the scheduler free up the timedout job in case it got signaled. In case of exec jobs the job type specific callback will take care to signal all fences and tear down the channel. Fixes: b88baab82871 ("drm/nouveau: implement new VM_BIND uAPI") Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230916162835.5719-1-dakr@redhat.com
2023-09-20drm/nouveau: fence: fix type cast warning in nouveau_fence_emit()Danilo Krummrich
Fix the following warning. drivers/gpu/drm/nouveau/nouveau_fence.c:210:45: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct nouveau_channel *chan @@ got struct nouveau_channel [noderef] __rcu *channel We're just about to emit the fence, there is nothing to protect against yet, hence it is safe to just cast __rcu away. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309140340.BwKXzaDx-lkp@intel.com/ Fixes: 978474dc8278 ("drm/nouveau: fence: fix undefined fence state after emit") Signed-off-by: Danilo Krummrich <dakr@redhat.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230916011501.15813-1-dakr@redhat.com
2023-09-18drm: fix up fbdev Kconfig defaultsArnd Bergmann
As a result of the recent Kconfig reworks, the default settings for the framebuffer interfaces changed in unexpected ways: Configurations that leave CONFIG_FB disabled but use DRM now get DRM_FBDEV_EMULATION by default. This also turns on the deprecated /dev/fb device nodes for machines that don't actually want it. In turn, configurations that previously had DRM_FBDEV_EMULATION enabled now only get the /dev/fb front-end but not the more useful framebuffer console, which is not selected any more. We had previously decided that any combination of the three frontends (FB_DEVICE, FRAMEBUFFER_CONSOLE and LOGO) should be selectable, but the new default settings mean that a lot of defconfig files would have to get adapted. Change the defaults back to what they were in Linux 6.5: - Leave DRM_FBDEV_EMULATION turned off unless CONFIG_FB is enabled. Previously this was a hard dependency but now the two are independent. However, configurations that enable CONFIG_FB probably also want to keep the emulation for DRM, while those without FB presumably did that intentionally in the past. - Leave FB_DEVICE turned off for FB=n. Following the same logic, the deprecated option should not automatically get enabled here, most users that had FB turned off in the past do not want it, even if they want the console - Turn the FRAMEBUFFER_CONSOLE option on if DRM_FBDEV_EMULATION is set to avoid having to change defconfig files that relied on it being selected unconditionally in the past. This also makes sense since both LOGO and FB_DEVICE are now disabled by default for builds without CONFIG_FB, but DRM_FBDEV_EMULATION would make no sense if all three are disabled. Fixes: a5ae331edb02b ("drm: Drop select FRAMEBUFFER_CONSOLE for DRM_FBDEV_EMULATION") Fixes: 701d2054fa317 ("fbdev: Make support for userspace interfaces configurable") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230911205338.2385278-1-arnd@kernel.org
2023-09-15drm/tests: Fix incorrect argument in drm_test_mm_insert_rangeJanusz Krzysztofik
While drm_mm test was converted form igt selftest to kunit, unexpected value of "end" argument equal "start" was introduced to one of calls to a function that executes the drm_test_mm_insert_range for specific start/end pair of arguments. As a consequence, DRM_MM_BUG_ON(end <= start) is triggered. Fix it by restoring the original value. Fixes: fc8d29e298cf ("drm: selftest: convert drm_mm selftest to KUnit") Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: "Maíra Canal" <mairacanal@riseup.net> Cc: Arthur Grillo <arthurgrillo@riseup.net> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Daniel Latypov <dlatypov@google.com> Cc: stable@vger.kernel.org # v6.1+ Reviewed-by: Maíra Canal <mairacanal@riseup.net> Signed-off-by: Maíra Canal <mairacanal@riseup.net> Link: https://patchwork.freedesktop.org/patch/msgid/20230911130323.7037-2-janusz.krzysztofik@linux.intel.com
2023-09-15Merge tag 'drm-misc-fixes-2023-09-14' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: * radeon: Uninterruptible fence waiting * tests: Fix use-after-free bug * vkms: Revert hrtimer fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230914122649.GA28252@linux-uq9g
2023-09-15Merge tag 'drm-intel-fixes-2023-09-14' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Only check eDP HPD when AUX CH is shared. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZQL+NqtIZH5F/Nxr@intel.com
2023-09-15Merge tag 'amd-drm-fixes-6.6-2023-09-13' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.6-2023-09-13: amdgpu: - GC 9.4.3 fixes - Fix white screen issues with S/G display on system with >= 64G of ram - Replay fixes - SMU 13.0.6 fixes - AUX backlight fix - NBIO 4.3 SR-IOV fixes for HDP - RAS fixes - DP MST resume fix - Fix segfault on systems with no vbios - DPIA fixes amdkfd: - CWSR grace period fix - Unaligned doorbell fix - CRIU fix for GFX11 - Add missing TLB flush on gfx10 and newer Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230913195009.7714-1-alexander.deucher@amd.com
2023-09-14Merge tag 'drm-misc-fixes-2023-09-07' of ↵Daniel Vetter
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes One doc fix for drm/connector, one fix for amdgpu for an crash when VRAM usage is high, and one fix in gm12u320 to fix the timeout units in the code Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/w5nlld5ukeh6bgtljsxmkex3e7s7f4qquuqkv5lv4cv3uxzwqr@pgokpejfsyef
2023-09-14Revert "drm/vkms: Fix race-condition between the hrtimer and the atomic commit"Maíra Canal
This reverts commit a0e6a017ab56936c0405fe914a793b241ed25ee0. Unlocking a mutex in the context of a hrtimer callback is violating mutex locking rules, as mutex_unlock() from interrupt context is not permitted. Link: https://lore.kernel.org/dri-devel/ZQLAc%2FFwkv%2FGiVoK@phenom.ffwll.local/T/#t Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Maíra Canal <mcanal@igalia.com> Signed-off-by: Maíra Canal <mairacanal@riseup.net> Link: https://patchwork.freedesktop.org/patch/msgid/20230914102024.1789154-1-mcanal@igalia.com
2023-09-12drm/amdkfd: Insert missing TLB flush on GFX10 and laterHarish Kasiviswanathan
Heavy-weight TLB flush is required after unmap on all GPUs for correctness and security. Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-09-12drm/i915: Only check eDP HPD when AUX CH is sharedVille Syrjälä
Apparently Acer Chromebook C740 (BDW-ULT) doesn't have the eDP HPD line properly connected, and thus fails the new HPD check during eDP probe. The result is that we lose the eDP output. I suspect all such machines would be Chromebooks or other Linux exclusive systems as the Windows driver likely wouldn't work either. I did check a few other BDW machines here and those do have eDP HPD connected, one of them even is a different Chromebook (Samus). To account for these funky machines let's skip the HPD check when it looks like the eDP port is the only one using that specific AUX channel. In case of multiple ports sharing the same AUX CH (eg. on Asrock B250M-HDV) we still do the check and thus should correctly ignore the eDP port in favor of the other DP port (usually a DP->VGA converter). v2: Don't oops during list iteration Cc: stable@vger.kernel.org Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/9264 Fixes: cfe5bdfb27fa ("drm/i915: Check HPD live state during eDP probe") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230908052527.685-1-ville.syrjala@linux.intel.com Reviewed-by: Luca Coelho <luciano.coelho@intel.com> (cherry picked from commit 70052100fabec5d8c1b09c9959817a2f4517e6b5) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-09-12Merge drm/drm-fixes into drm-misc-fixesThomas Zimmermann
Forwarding to v6.6-rc1. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2023-09-11drm/amd/display: Fix 2nd DPIA encoder AssignmentMustapha Ghaddar
[HOW & Why] There seems to be an issue with 2nd DPIA acquiring link encoder for tiled displays. Solution is to remove check for eng_id before we get first dynamic encoder for it Reviewed-by: Cruise Hung <cruise.hung@amd.com> Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Acked-by: Stylon Wang <stylon.wang@amd.com> Signed-off-by: Mustapha Ghaddar <mghaddar@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amd/display: Add DPIA Link Encoder Assignment FixMustapha Ghaddar
For DPIA we should have preferred DIG assignment based on DPIA selected as per the ASIC design. Reviewed-by: George Shen <george.shen@amd.com> Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Mustapha Ghaddar <mghaddar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-09-11drm/amd/display: fix replay_mode kernel-doc warningRandy Dunlap
Fix the typo in the kernel-doc for @replay_mode to prevent kernel-doc warnings: drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:623: warning: Incorrect use of kernel-doc format: * @replay mode: Replay supported drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:626: warning: Function parameter or member 'replay_mode' not described in 'amdgpu_hdmi_vsdb_info' Fixes: ec8e59cb4e0c ("drm/amd/display: Get replay info from VSDB") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Leo Li <sunpeng.li@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amdgpu: Handle null atom context in VBIOS info ioctlDavid Francis
On some APU systems, there is no atom context and so the atom_context struct is null. Add a check to the VBIOS_INFO branch of amdgpu_info_ioctl to handle this case, returning all zeroes. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amdkfd: Checkpoint and restore queues on GFX11David Francis
The code in kfd_mqd_manager_v11.c to support criu dump and restore of queue state was missing. Added it; should be equivalent to kfd_mqd_manager_v10.c. CC: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amd/display: Adjust the MST resume flowWayne Lin
[Why] In drm_dp_mst_topology_mgr_resume() today, it will resume the mst branch to be ready handling mst mode and also consecutively do the mst topology probing. Which will cause the dirver have chance to fire hotplug event before restoring the old state. Then Userspace will react to the hotplug event based on a wrong state. [How] Adjust the mst resume flow as: 1. set dpcd to resume mst branch status 2. restore source old state 3. Do mst resume topology probing For drm_dp_mst_topology_mgr_resume(), it's better to adjust it to pull out topology probing work into a 2nd part procedure of the mst resume. Will have a follow up patch in drm. Reviewed-by: Chao-kai Wang <stylon.wang@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Acked-by: Stylon Wang <stylon.wang@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amdgpu: fallback to old RAS error message for aqua_vanjaramHawking Zhang
So driver doesn't generate incorrect message until the new format is settled down for aqua_vanjaram Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset for SR-IOVAlex Deucher
Needed for HDP flush to work correctly. Reviewed-by: Timmy Tsai <timmtsai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amdgpu/soc21: don't remap HDP registers for SR-IOVAlex Deucher
This matches the behavior for soc15 and nv. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Timmy Tsai <timmtsai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amd/display: Don't check registers, if using AUX BL controlSwapnil Patel
[Why] Currently the driver looks DCN registers to access if BL is on or not. This check is not valid if we are using AUX based brightness control. This causes driver to not send out "backlight off" command during power off sequence as it already thinks it is off. [How] Only check DCN registers if we aren't using AUX based brightness control. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Acked-by: Stylon Wang <stylon.wang@amd.com> Signed-off-by: Swapnil Patel <swapnil.patel@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amdgpu: fix retry loop testDan Carpenter
This loop will exit with "retry" set to -1 if it fails but the code checks for if "retry" is zero. Fix this by changing post-op to a pre-op. --retry vs retry--. Fixes: e01eeffc3f86 ("drm/amd/pm: avoid driver getting empty metrics table for the first time") Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amd/display: Add dirty rect support for ReplayBhawanpreet Lakha
Dirty rect can be used with replay, so enable them to allow for more powersaving. Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Acked-by: Stylon Wang <stylon.wang@amd.com> Signed-off-by: Bhawanpreet Lakha <bhawanpreet.lakha@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11Revert "drm/amd: Disable S/G for APUs when 64GB or more host memory"Hamza Mahfooz
This reverts commit 70e64c4d522b732e31c6475a3be2349de337d321. Since, we now have an actual fix for this issue, we can get rid of this workaround as it can cause pin failures if enough VRAM isn't carved out by the BIOS. Cc: stable@vger.kernel.org # 6.1+ Acked-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amd/display: fix the white screen issue when >= 64GB DRAMYifan Zhang
Dropping bit 31:4 of page table base is wrong, it makes page table base points to wrong address if phys addr is beyond 64GB; dropping page_table_start/end bit 31:4 is unnecessary since dcn20_vmid_setup will do that. Also, while we are at it, cleanup the assignments using upper_32_bits()/lower_32_bits() and AMDGPU_GPU_PAGE_SHIFT. Cc: stable@vger.kernel.org Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354 Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)") Acked-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Co-developed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amdkfd: Update CU masking for GFX 9.4.3Mukul Joshi
The CU mask passed from user-space will change based on different spatial partitioning mode. As a result, update CU masking code for GFX9.4.3 to work for all partitioning modes. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amdkfd: Update cache info reporting for GFX v9.4.3Mukul Joshi
Update cache info reporting in sysfs to report the correct number of CUs and associated cache information based on different spatial partitioning modes. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amdgpu: Store CU info from all XCCs for GFX v9.4.3Mukul Joshi
Currently, we store CU info only for a single XCC assuming that it is the same for all XCCs. However, that may not be true. As a result, store CU info for all XCCs. This info is later used for CU masking. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amdkfd: Fix unaligned 64-bit doorbell warningMukul Joshi
This patch fixes the following unaligned 64-bit doorbell warning seen when submitting packets on HIQ on GFX v9.4.3 by making the HIQ doorbell 64-bit aligned. The warning is seen when GPU is loaded in any mode other than SPX mode. [ +0.000301] ------------[ cut here ]------------ [ +0.000003] Unaligned 64-bit doorbell [ +0.000030] WARNING: /amdkfd/kfd_doorbell.c:339 write_kernel_doorbell64+0x72/0x80 [ +0.000003] RIP: 0010:write_kernel_doorbell64+0x72/0x80 [ +0.000004] RSP: 0018:ffffc90004287730 EFLAGS: 00010246 [ +0.000005] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ +0.000003] RDX: 0000000000000001 RSI: ffffffff82837c71 RDI: 00000000ffffffff [ +0.000003] RBP: ffffc90004287748 R08: 0000000000000003 R09: 0000000000000001 [ +0.000002] R10: 000000000000001a R11: ffff88a034008198 R12: ffffc900013bd004 [ +0.000003] R13: 0000000000000008 R14: ffffc900042877b0 R15: 000000000000007f [ +0.000003] FS: 00007fa8c7b62000(0000) GS:ffff889f88400000(0000) knlGS:0000000000000000 [ +0.000004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000003] CR2: 000056111c45aaf0 CR3: 00000001414f2002 CR4: 0000000000770ee0 [ +0.000003] PKRU: 55555554 [ +0.000002] Call Trace: [ +0.000004] <TASK> [ +0.000006] kq_submit_packet+0x45/0x50 [amdgpu] [ +0.000524] pm_send_set_resources+0x7f/0xc0 [amdgpu] [ +0.000500] set_sched_resources+0xe4/0x160 [amdgpu] [ +0.000503] start_cpsch+0x1c5/0x2a0 [amdgpu] [ +0.000497] kgd2kfd_device_init.cold+0x816/0xb42 [amdgpu] [ +0.000743] amdgpu_amdkfd_device_init+0x15f/0x1f0 [amdgpu] [ +0.000602] amdgpu_device_init.cold+0x1813/0x2176 [amdgpu] [ +0.000684] ? pci_bus_read_config_word+0x4a/0x80 [ +0.000012] ? do_pci_enable_device+0xdc/0x110 [ +0.000008] amdgpu_driver_load_kms+0x1a/0x110 [amdgpu] [ +0.000545] amdgpu_pci_probe+0x197/0x400 [amdgpu] Fixes: c31866651086 ("drm/amdgpu: use doorbell mgr for kfd kernel doorbells") Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11drm/amdkfd: Fix reg offset for setting CWSR grace periodMukul Joshi
This patch fixes the case where the code currently passes absolute register address and not the reg offset, which HWS expects, when sending the PM4 packet to set/update CWSR grace period. Additionally, cleanup the signature of build_grace_period_packet_info function as it no longer needs the inst parameter. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11Merge tag 'drm-misc-next-fixes-2023-09-11' of ↵Daniel Vetter
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: * nouveau: Lockdep workaround * fbdev/g364fb: Build fix Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230911141915.GA983@linux-uq9g
2023-09-10Merge tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm ci scripts from Dave Airlie: "This is a bunch of ci integration for the freedesktop gitlab instance where we currently do upstream userspace testing on diverse sets of GPU hardware. From my perspective I think it's an experiment worth going with and seeing how the benefits/noise playout keeping these files useful. Ideally I'd like to get this so we can do pre-merge testing on PRs eventually. Below is some info from danvet on why we've ended up making the decision and how we can roll it back if we decide it was a bad plan. Why in upstream? - like documentation, testcases, tools CI integration is one of these things where you can waste endless amounts of time if you accidentally have a version that doesn't match your source code - but also like the above, there's a balance, this is the initial cut of what we think makes sense to keep in sync vs out-of-tree, probably needs adjustment - gitlab supports out-of-repo gitlab integration and that's what's been used for the kernel in drm, but it results in per-driver fragmentation and lots of duplicated effort. the simple act of smashing an arbitrary winner into a topic branch already started surfacing patches on dri-devel and sparking good cross driver team discussions Why gitlab? - it's not any more shit than any of the other CI - drm userspace uses it extensively for everything in userspace, we have a lot of people and experience with this, including integration of hw testing labs - media userspace like gstreamer is also on gitlab.fd.o, and there's discussion to extend this to the media subsystem in some fashion Can this be shared? - there's definitely a pile of code that could move to scripts/ if other subsystem adopt ci integration in upstream kernel git. other bits are more drm/gpu specific like the igt-gpu-tests/tools integration - docker images can be run locally or in other CI runners Will we regret this? - it's all in one directory, intentionally, for easy deletion - probably 1-2 years in upstream to see whether this is worth it or a Big Mistake. that's roughly what it took to _really_ roll out solid CI in the bigger userspace projects we have on gitlab.fd.o like mesa3d" * tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm: drm: ci: docs: fix build warning - add missing escape drm: Add initial ci/ subdirectory