summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/xe/regs
AgeCommit message (Collapse)Author
2026-02-23drm/xe/wa: Steer RMW of MCR registers while building default LRCMatt Roper
When generating the default LRC, if a register is not masked, we apply any save-restore programming necessary via a read-modify-write sequence that will ensure we only update the relevant bits/fields without clobbering the rest of the register. However some of the registers that need to be updated might be MCR registers which require steering to a non-terminated instance to ensure we can read back a valid, non-zero value. The steering of reads originating from a command streamer is controlled by register CS_MMIO_GROUP_INSTANCE_SELECT. Emit additional MI_LRI commands to update the steering before any RMW of an MCR register to ensure the reads are performed properly. Note that needing to perform a RMW of an MCR register while building the default LRC is pretty rare. Most of the MCR registers that are part of an engine's LRCs are also masked registers, so no MCR is necessary. Fixes: f2f90989ccff ("drm/xe: Avoid reading RMW registers in emit_wa_job") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patch.msgid.link/20260206223058.387014-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit 6c2e331c915ba9e774aa847921262805feb00863) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-14drm/xe/mert: Improve handling of MERT CAT errorsMichal Wajdeczko
All MERT catastrophic errors but VF's LMTT fault are serious, so we shouldn't limit our handling only to print debug messages. Change CATERR message to error level and then declare the device as wedged to match expectation from the design document. For the LMTT faults, add a note about adding tracking of this unexpected VF activity. While at it, rename register fields defnitions to match the BSpec. Also drop trailing include guard name from the regs.h file. BSpec: 74625 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260112183716.28700-1-michal.wajdeczko@intel.com
2026-01-12drm/xe/hwmon: Expose individual VRAM channel temperatureKarthik Poosa
Expose individual VRAM temperature attributes. Update Xe hwmon documentation for this entry. v2: - Avoid using default switch case for VRAM individual temperatures. - Append labels with VRAM channel number. - Update kernel version in Xe hwmon documentation. v3: - Add missing brackets in Xe hwmon documentation from VRAM channel sysfs. - Reorder BMG_VRAM_TEMPERATURE_N macro in xe_pcode_regs.h. - Add api to check if VRAM is available on the channel. v4: - Improve VRAM label handling to eliminate temp variable by introducing a dedicated array vram_label in xe_hwmon_thermal_info. - Remove a magic number. - Change the label from vram_X to vram_ch_X. v5: - Address review comments from Raag. - Change vram to VRAM in commit title and subject. - Refactor BMG_VRAM_TEMPERATURE_N macro. - Refactor is_vram_ch_available(). - Rephrase a comment. - Check individual VRAM temperature limits in addition to VRAM availability in xe_hwmon_temp_is_visible. (Raag) - Move VRAM label change out of this patch. v6: - Use in_range() for VRAM_N index check instead of if check. (Raag) - Minor aesthetic changes. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://patch.msgid.link/20260112203521.1014388-5-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-09drm/xe: Allow compressible surfaces to be 1-way coherentXin Wang
Previously, compressible surfaces were required to be non-coherent (allocated as WC) because compression and coherency were mutually exclusive. Starting with Xe3, hardware supports combining compression with 1-way coherency, allowing compressible surfaces to be allocated as WB memory. This provides applications with more efficient memory allocation by avoiding WC allocation overhead that can cause system stuttering and memory management challenges. The implementation adds support for compressed+coherent PAT entry for the xe3_lpg devices and updates the driver logic to handle the new compression capabilities. v2: (Matthew Auld) - Improved error handling with XE_IOCTL_DBG() - Enhanced documentation and comments - Fixed xe_bo_needs_ccs_pages() outdated compression assumptions v3: - Improve WB compression support detection by checking PAT table instead of version check v4: - Add XE_CACHE_WB_COMPRESSION, which simplifies the logic. v5: - Use U16_MAX for the invalid PAT index. (Matthew Auld) Bspec: 71582, 59361, 59399 Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Xin Wang <x.wang@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260109093007.546784-1-x.wang@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-23drm/xe/soc_remapper: Add system controller config for SoC remapperUmesh Nerlige Ramappa
Define system controller config bits and helpers for SoC remapper. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patch.msgid.link/20251223183943.3175941-8-umesh.nerlige.ramappa@intel.com
2025-12-23drm/xe/soc_remapper: Use SoC remapper helper from VSEC codeUmesh Nerlige Ramappa
Since different drivers can use SoC remapper, modify VSEC code to access SoC remapper via a helper that would synchronize such accesses. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://patch.msgid.link/20251223183943.3175941-7-umesh.nerlige.ramappa@intel.com
2025-12-16drm/xe/oa/uapi: Expose MERT OA unitAshutosh Dixit
A MERT OA unit is available in the SoC on some platforms. Add support for this OA unit and expose it to userspace. The MERT OA unit does not have any HW engines attached, but is otherwise similar to an OAM unit. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/20251205212613.826224-2-ashutosh.dixit@intel.com
2025-12-12drm/xe: Track pre-production workaround supportMatt Roper
When we're initially enabling driver support for a new platform/IP, we usually implement all workarounds documented in the WA database in the driver. Many of those workarounds are restricted to early steppings that only showed up in pre-production hardware (i.e., internal test chips that are not available to the general public). Since the workarounds for early, pre-production steppings tend to be some of the ugliest and most complicated workarounds, we generally want to eliminate them and simplify the code once the platform has launched and our internal usage of those pre-production parts have been phased out. Let's add a flag to the device info that tracks which platforms still have support for pre-production workarounds for so that we can print a warning and taint if someone tries to load the driver on a pre-production part for a platform without pre-production workarounds. This will help our internal users understand the likely problems they'll encounter if they try to load the driver on an old pre-production device. The Xe behavior here is similar to what we've done for many years on i915 (see intel_detect_preproduction_hw()), except that instead of manually coding up ranges of device steppings that we believe to be pre-production hardware, Xe will use the hardware's own production vs pre-production fusing status, which we can read from the FUSE2 register. This fuse didn't exist on older Intel hardware, but should be present on all platforms supported by the Xe driver. Going forward, let's set the expectation that we'll start looking into removing pre-production workarounds for a platform around the time that platforms of the next major IP stepping are having their force_probe requirement lifted. This timing is just a rough guideline; there may be cases where some instances of pre-production parts are still being actively used in CI farms, internal device pools, etc. and we'll need to wait a bit longer for those to be swapped out. v2: - Fix inverted forcewake check v3: - Invert flag and add it to the platforms on which we still have pre-prod workarounds. (Jani, Lucas) v4: - Avoid checking pre-production on VF since they don't have access to the FUSE2 register. Bspec: 78271, 52544 Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patch.msgid.link/20251212181411.294854-3-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-12drm/xe: Create page reclaim list on unbindBrian Nguyen
Page reclaim list (PRL) is preparation work for the page reclaim feature. The PRL is firstly owned by pt_update_ops and all other page reclaim operations will point back to this PRL. PRL generates its entries during the unbind page walker, updating the PRL. This PRL is restricted to a 4K page, so 512 page entries at most. v2: - Removed unused function. (Shuicheng) - Compacted warning checking, update commit message, spelling, etc. (Shuicheng, Matthew B) - Fix kernel docs - Moved PRL max entries overflow handling out from generate_reclaim_entry to caller (Shuicheng) - Add xe_page_reclaim_list_init for clarity. (Matthew B) - Modify xe_guc_page_reclaim_entry to use macros for greater flexbility. (Matthew B) - Add fallback for PTE outside of page reclaim supported 4K, 64K, 2M pages (Matthew B) - Invalidate PRL for early abort page walk. - Removed page reclaim related variables from tlb fence (Matthew Brost) - Remove error handling in *alloc_entries failure. (Matthew B) v3: - Fix NULL pointer dereference check. - Modify reclaim_entry to QW and bitfields accordingly. (Matthew B) - Add vm_dbg prints for PRL generation and invalidation. (Matthew B) v4: - s/GENMASK/GENMASK_ULL && s/BIT/BIT_ULL (CI) v5: - Addition of xe_page_reclaim_list_is_new() to avoid continuous allocation of PRL if consecutive VMAs cause a PRL invalidation. - Add xe_page_reclaim_list_valid() helpers for clarity. (Matthew B) - Move xe_page_reclaim_list_entries_put in xe_page_reclaim_list_invalidate. Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251212213225.3564537-17-brian3.nguyen@intel.com
2025-12-04drm/xe/rtp: Whitelist OAM MMIO trigger registersAshutosh Dixit
Whitelist OAM registers to enable userspace to execute MMIO triggers on OAM units. Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20251202025115.373546-6-ashutosh.dixit@intel.com
2025-12-01drm/xe/xe3_lpg: Apply Wa_16028005424Balasubramani Vivekanandan
Applied Wa_16028005424 to Graphics version from 30.00 to 30.05 Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patch.msgid.link/20251121100822.20076-2-balasubramani.vivekanandan@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-11-25drm/xe/pf: Handle MERT catastrophic errorsLukasz Laguna
The MERT block triggers an interrupt when a catastrophic error occurs. Update the interrupt handler to read the MERT catastrophic error type and log appropriate debug message. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251124190237.20503-5-lukasz.laguna@intel.com
2025-11-25drm/xe/pf: Add TLB invalidation support for MERTLukasz Laguna
Add support for triggering and handling MERT TLB invalidation. After LMTT updates, the MERT TLB invalidation is initiated to ensure memory translations remain coherent. Completion of the invalidation is signaled via MERT interrupt (bit 13 in the GFX master interrupt register). Detect and handle this interrupt to properly synchronize the invalidation flow. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251124190237.20503-4-lukasz.laguna@intel.com
2025-11-25drm/xe/pf: Configure LMTT in MERTLukasz Laguna
On platforms with standalone MERT, the PF driver needs to program LMTT in MERT's LMEM_CFG register. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251124190237.20503-3-lukasz.laguna@intel.com
2025-11-10drm/xe: Use SG_TILE_ADDR_RANGE instead of TILE_ADDR_RANGEFei Yang
The TILE_ADDR_RANGE register is not available on all platforms going forward as it was deprecated and is being replaced by equivalent registers within SoC MMIO space. While that doesn't happen, the SG_TILE_ADDR_RANGE (base 0x1083a0) is still valid for all platforms supported by xe. Use that instead. BSpec: 59353, 54991 Signed-off-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patch.msgid.link/20251107-tile-addr-v1-1-a3014aadc2e7@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-30drm/xe/cri: Add new performance limit reasons bitsSk Anirban
Crescent Island has some additional and different bits for performance limit reasons. Add the new definitions and use them for CRI. Signed-off-by: Sk Anirban <sk.anirban@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://patch.msgid.link/20251029-gt-throttle-cri-v3-1-d1f5abbb8114@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-29drm/xe/xe_debugfs: Expose G7 package state residency counter through debugfsMohammed Thasleem
Add G7 package state residency counter in debugfs alongside existing G2,G6,G8,G10 states for complete power state visibility. Signed-off-by: Mohammed Thasleem <mohammed.thasleem@intel.com> Reviewed-by: Karthik Poosa <karthik.poosa@intel.com> Link: https://patch.msgid.link/20251016001219.37684-1-mohammed.thasleem@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-27drm/xe/xe3: Add WA_14024681466 for Xe3_LPGNitin Gote
Apply WA_14024681466 to Xe3_LPG graphics IP versions from 30.00 to 30.05. v2: (Matthew Roper) - Remove stepping filter as workaround applies to all steppings. - Add an engine class filter so it only applies to the RENDER engine. Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://patch.msgid.link/20251027092643.335904-1-nitin.r.gote@intel.com Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-10-20drm/xe/xe3p_lpm: Add special check in Media GT for Main GAMCTRLBalasubramani Vivekanandan
For Xe3p arch some subunits of an IP may be different. The GMD_ID register returns the Xe3p arch and dedicates the reserved field to mark possible subunit differences. Generally this is an under-the-hood implementation detail that drivers don't need to worry about, but the new Main_GAMCTRL may be enabled or not depending on those. Those reserved bits are described for Xe3p as: "If Zero, No special case to be handled. If Non-Zero, special case to be handled by Software agent.". That special case is defined per Arch. So if media version is 35, also check the additional reserved bits. To avoid confusion with the usual meaning of "reserved", define them as GMD_ID_SUBIP_FLAG_MASK. Bspec: 74201 Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20251019-xe3p-gamctrl-v1-2-ad66d3c1908f@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-20drm/xe/xe3p_lpm: Configure MAIN_GAMCTRL_QUEUE_SELECTBrian Welty
Starting from Xe3p, there are two different copies of some of the GAM registers: the traditional MCR variant at their old locations, and a new unicast copy known as "main_gamctrl." The Xe driver doesn't use these registers directly, but we need to instruct the GuC on which set it should use. Since the new, unicast registers are preferred (since they avoid the need for unnecessary MCR synchronization), set a new GuC feature flag, GUC_CTL_MAIN_GAMCTRL_QUEUES to convey this decision. A new helper function, xe_guc_using_main_gamctrl_queues(), is added for use in the 3 independent places that need to handle configuration of the new reporting queues. The mmio write to enable the main gamctl is only done during the general GuC upload. The gamctrl registers are not accessed by the GuC during hwconfig load. Last, the ADS blob for communicating the queue addresses contains both a DPA and GGTT offset. The GuC documentation states that DPA is now MBZ when using the MAIN_GAMCTRL queues. Bspec: 76445, 73540 Signed-off-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20251019-xe3p-gamctrl-v1-1-ad66d3c1908f@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-18drm/xe/xe3p_xpc: Add support for compute walker for non-MSIxLucas De Marchi
Current implementation of compute walker has dependency on GPU/SW Stack which requires SW/UMD to wait for event from KMD to indicate PIPE_CONTROL interrupt was done. This created latency on SW stack. This feature adds support to generate completion interrupt from GPGPU walker which does not support MSIx and avoid software using Pipe control drain/idle latency. The only thing needed for the kernel driver to do here is to wakeup the thread waiting on the ufence, which is already handled by the irq handler. Before waiting on this event, the userspace side can opt-in to this interrupt being generated by the HW by selecting the flag in the POST_SYNC_DATA_2 substructure's dw0[3] of COMPUTE_WALKER_2 instruction. Bspec: 62346, 74334 Suggested-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: S A Muqthyar Ahmed <syed.abdul.muqthyar.ahmed@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20251016-xe3p-v3-21-3dd173a3097a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-18drm/xe/irq: Check fuse mask for media enginesLucas De Marchi
Just like the other engines, check xe_hw_engine_mask_per_class() for VCS and VECS to account for architectural availability of those registers. With that, all the possibly available media engines can have their interrupts enabled. Bspec: 54030 Suggested-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20251016-xe3p-v3-20-3dd173a3097a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-18drm/xe/irq: Rename bits used with all enginesLucas De Marchi
Two bit fields have similar functionality across the interrupt vectors but are named "RENDER". Rename them to follow the bspec more closely and clear any confusion when using them for other engines. Bspec: 62353, 62354, 62355, 62346, 62345, 63341 Suggested-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20251016-xe3p-v3-19-3dd173a3097a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-17drm/xe/xe3p: Dump CSMQDEBUG registerWang Xin
The CSMQDEBUG is useful for the development of MQ feature. Start dumping the debug register. Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Wang Xin <x.wang@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20251016-xe3p-v3-10-3dd173a3097a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-17drm/xe: Dump CURRENT_LRCA registerWang Xin
Add CURRENT_LRCA to register dump to help debugging. Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Wang Xin <x.wang@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20251016-xe3p-v3-9-3dd173a3097a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-17drm/xe/xe3p: Determine service copy availability from fuseMatt Roper
Xe3p introduces a dedicated SERVICE_COPY_ENABLE fuse register to reflect the availability of the service copy engines (BCS1-BCS8). Bspec: 74624 Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20251016-xe3p-v3-8-3dd173a3097a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-17drm/xe: Drop CTC_MODE register readBalasubramani Vivekanandan
The warning was added for a condition that never triggered even for platforms prior to Xe2. It's not supported in Xe2 and in Xe3p the register is removed from the main GT. Just drop the entire function as it doesn't bring any benefit. Bspec: 62395 Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> [ Drop the entire check for CTC_MODE ] Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20251016-xe3p-v3-3-3dd173a3097a@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13drm/xe/i2c: Wire up reset/postinstall for I2C IRQRaag Jadav
I2C IRQ needs to be routed to SGUnit or PUnit for the devices that support it. Wire up reset/postinstall handles for I2C IRQ to take care of this. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20251011123509.3233213-3-raag.jadav@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-10drm/xe: Enable media sampler power gatingVinay Belgaumkar
Where applicable, enable media sampler power gating. Also, add it to the powergate_info debugfs. v2: Remove the sampler powergate status since it is cleared quickly anyway. v3: Use vcs mask (Rodrigo) and fix the version check for media v4: Remove extra spaces v5: Media samplers are independent of vcs mask, use Media version 1255 (Matt Roper) Fixes: 38e8c4184ea0 ("drm/xe: Enable Coarse Power Gating") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20251010011047.2047584-1-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-18drm/xe/lrc: Allow INDIRECT_CTX for more engine classesLucas De Marchi
Currently it's only allowed for render and compute. Going forward we want to enable it for more engine classes. Let the XE_LRC_FLAG_INDIRECT_CTX flag (and thus gt_engine_needs_indirect_ctx()) be the deciding factor for its availability. While at it, add the missing const to rcs_funcs array. Since CTX_INDIRECT_CTX_OFFSET_DEFAULT already matches the HW default and gt_engine_needs_indirect_ctx() only ever enables it for rcs/ccs, there is no change in behavior, it's only preparation for future use case. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-5-306bddbc15da@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-05drm/xe/xe2hpg: Add Wa_18041344222 for Xe2_HPGHarish Chegondi
Add Wa_18041344222 for Xe2_HPG that requires disabling the perf mode for subslice count for eustall sampling when the enabled slices are discontiguous. Bspec: 79483, 56024 Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/b6a631a13a9fb7360e89d679e0797fae42d5a09e.1756855529.git.harish.chegondi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-08-26drm/xe/xe_hw_error: Handle CSC Firmware reported Hardware errorsRiana Tauro
Add support to handle CSC firmware reported errors. When CSC firmware errors are encoutered, a error interrupt is received by the GFX device as a MSI interrupt. Device Source control registers indicates the source of the error as CSC The HEC error status register indicates that the error is firmware reported Depending on the type of error, the error cause is written to the HEC Firmware error register. On encountering such CSC firmware errors, the graphics device is non-recoverable from driver context. The only way to recover from these errors is firmware flash. System admin/userspace is notified of the necessity of firmware flash with a combination of vendor-specific drm device edged uevent, dmesg logs and runtime survivability sysfs. It is the responsiblity of the consumer to verify all the actions and then trigger a firmware flash using tools like fwupd. $ udevadm monitor --property --kernel monitor will print the received events for: KERNEL - the kernel uevent KERNEL[754.709341] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 (drm) ACTION=change DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 SUBSYSTEM=drm WEDGED=vendor-specific DEVNAME=/dev/dri/card0 DEVTYPE=drm_minor SEQNUM=5973 MAJOR=226 MINOR=0 Logs xe 0000:03:00.0: [drm] *ERROR* [Hardware Error]: Tile0 reported NONFATAL error 0x20000 xe 0000:03:00.0: [drm] *ERROR* [Hardware Error]: NONFATAL: HEC Uncorrected FW FD Corruption error reported, bit[2] is set xe 0000:03:00.0: Runtime Survivability mode enabled xe 0000:03:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:03:00.0 as wedged. IOCTLs and executions are blocked. Only a rebind may clear the failure Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new xe 0000:03:00.0: [drm] device wedged, needs recovery xe 0000:03:00.0: Firmware flash required, Please refer to the userspace documentation for more details! Runtime survivability Sysfs: /sys/bus/pci/devices/<device>/survivability_mode v2: use vendor recovery method with runtime survivability (Christian, Rodrigo, Raag) v3: move declare wedged to runtime survivability mode (Rodrigo) v4: update commit message Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250826063419.3022216-10-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-26drm/xe: Add support to handle hardware errorsRiana Tauro
Gfx device reports two classes of errors: uncorrectable and correctable. Depending on the severity uncorrectable errors are further classified Non-Fatal and Fatal. Correctable and Non-Fatal errors: These errors are reported as MSI. Bits in the Master Interrupt Register indicate the class of the error. The source of the error is then read from the Device Error Source Register. Fatal errors: These are reported as PCIe errors When a PCIe error is asserted, the OS will perform a SBR (Secondary Bus reset) which causes the driver to reload. The error registers are sticky and the values are maintained through SBR. Add basic support to handle these errors. Bspec: 50875, 53073, 53074, 53075, 53076 v2: Format commit message (Umesh) v3: fix documentation (Stuart) Cc: Stuart Summers <stuart.summers@intel.com> Co-developed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250826063419.3022216-9-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-12drm/xe/pf: Set VF LMEM BAR sizeMichał Winiarski
LMEM is partitioned between multiple VFs and we expect that the more VFs we have, the less LMEM is assigned to each VF. This means that we can achieve full LMEM BAR access without the need to attempt full VF LMEM BAR resize via pci_resize_resource(). Always try to set the largest possible BAR size that allows to fit the number of enabled VFs and inform the user in case the resize attempt is not successful. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20250527120637.665506-7-michal.winiarski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-25drm/xe/xelp: Add Wa_18022495364Tvrtko Ursulin
Add Wa_18022495364 as a context workaround batch buffer workaround. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250711160153.49833-9-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-24drm/xe: Rename MCFG_MCR_SELECTOR to STEER_SEMAPHORENitin Gote
The register at offset 0xfd0 was incorrectly named MCFG_MCR_SELECTOR, likely copied from i915. According to the hardware specification (Bspec), this register is actually called STEER_SEMAPHORE. Rename the register definition and update its usage in xe_gt_mcr.c to match the official hardware documentation. No functional changes. v2: Add Bspec reference (Tejas) Bspec: 67113 Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250723141039.3848390-1-nitin.r.gote@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17drm/xe/xe_debugfs: Exposure of G-State and pcie link state residency ↵Soham Purkait
counters through debugfs Add debug nodes, "dgfx_pkg_residencies" for G-states (G2, G6, G8, G10, ModS) and "dgfx_pcie_link_residencies" for PCIe link states(L0, L1, L1.2) residency counters. v1: - Expose all G-State residency counter values under dgfx_pkg_residencies. (Anshuman) - Include runtime_get/put. (Riana) v2: - Move offset macros to drm/xe/regs/xe_pmt. (Riana) v3: - Include debugfs node "dgfx_pcie_link_residencies" for pcie link residency counter values. (Anshuman) v4: - Include check for BMG and add helper function for repetitive code. (Riana) - Add for loop and local struct to avoid repetition. (Riana) - Use "drm_debugfs_create_files" to create debugfs. (Karthik) v5: - Reorder commits to reflect the correct dependency hierarchy. (Jonathan) - Simplification of commit message and rectified register offset.(Karthik) - Error handling and return before printing. (Riana) v6: - Remove check for DGFX as BMG is discrete. (Karthik) - Rearrange residency offsets in ascending order. (Riana) v7: - Squash the macros into the patch they are used in. (Lucas) Signed-off-by: Soham Purkait <soham.purkait@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://lore.kernel.org/r/20250716101412.3062780-2-soham.purkait@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-14drm/xe: Update register definitions in LRC layout headerXin Wang
Update the register definitions in xe_lrc_layout.h to align with the official hardware specification (Bspec) terminology. Specifically: - rename PVC_CTX_ACC_CTR_THOLD to CTX_ACC_CTR_THOLD - rename PVC_CTX_ASID to CTX_ASID Signed-off-by: Xin Wang <x.wang@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250711060924.7373-1-x.wang@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14drm/xe: Add plumbing for indirect context workaroundsTvrtko Ursulin
Some upcoming workarounds need to be emitted from the indirect workaround context so lets add some plumbing where they will be able to easily slot in. No functional changes for now since everything is still deactivated. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Bspec: 45954 Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250711160153.49833-7-tvrtko.ursulin@igalia.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10drm/xe/pm: Wire up suspend/resume for I2C controllerRaag Jadav
Wire up suspend/resume handles for I2C controller to match its power state with SGUnit. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250701122252.2590230-5-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-07-10drm/xe: Support for I2C attached MCUsHeikki Krogerus
Adding adaption/glue layer where the I2C host adapter (Synopsys DesignWare I2C adapter) and the I2C clients (the microcontroller units) are enumerated. The microcontroller units (MCU) that are attached to the GPU depend on the OEM. The initially supported MCU will be the Add-In Management Controller (AMC). Co-developed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250701122252.2590230-4-heikki.krogerus@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo fixed the co-developed tags and SPDX format in the .c file]
2025-06-23drm/xe/nvm: add support for access modeAlexander Usyskin
Check NVM access mode from GSC FW status registers and overwrite access status read from SPI descriptor, if needed. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Link: https://lore.kernel.org/r/20250617145159.3803852-8-alexander.usyskin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-06-18drm/xe/hwmon: Fix xe_hwmon_power_max_writeKarthik Poosa
Prevent other bits of mailbox power limit from being overwritten with 0. This issue was due to a missing read and modify of current power limit, before setting a requested mailbox power limit, which is added in this patch. v2: - Improve commit message. (Anshuman) v3: - Rebase. - Rephrase commit message. (Riana) - Add read-modify-write variant of xe_hwmon_pcode_write_power_limit() i.e. xe_hwmon_pcode_rmw_power_limit(). (Badal) - Use xe_hwmon_pcode_rmw_power_limit() to set mailbox power limits. - Remove xe_hwmon_pcode_write_power_limit() as all mailbox power limits writes use xe_hwmon_pcode_rmw_power_limit() only. v4: - Use PWR_LIM in place of (PWR_LIM_EN | PWR_LIM_VAL) wherever applicable. (Riana) Fixes: 7596d839f6228 ("drm/xe/hwmon: Add support to manage power limits though mailbox") Reviewed-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250617120030.612819-1-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-06-17drm/xe/oa: Enable OAM latency measurementAshutosh Dixit
Enable OAM latency measurement for Xe3+ platforms. Bspec: 58840 v2: Introduce DRM_XE_OA_UNIT_TYPE_OAM_SAG v3: Also add LNCF_MISC_CONFIG_REGISTER0 needed by MDAPI Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250606192618.4133817-6-ashutosh.dixit@intel.com
2025-05-30drm/xe/hwmon: Read energy status from PMTKarthik Poosa
Read card and package energy status using pmt apis instead of xe_mmio for supported platforms. Enable Battlemage to read energy from PMT. v2: - Remove unused has_pmt_energy field. (Badal) - Use GENMASK to extract energy data. (Badal) v3: - Move PMT energy register offset and GENMASK to xe_pmt.h - Address review comments. (Jani) v4: - Remove unnecessary debug print. (Badal) v5: - Resolve an unused variable warning. - Add a return value check. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250529163458.2354509-6-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-05-30drm/xe/hwmon: Add support to manage power limits though mailboxKarthik Poosa
Add support to manage power limits using pcode mailbox commands for supported platforms. v2: - Address review comments. (Badal) - Use mailbox commands instead of registers to manage power limits for BMG. - Clamp the maximum power limit to GPU firmware default value. v3: - Clamp power limit in write also for platforms with mailbox support. v4: - Remove unnecessary debug prints. (Badal) v5: - Update description of variable pl1_on_boot to fix kernel-doc error. v6: - Improve commit message, refer to BIOS as GPU firmware. - Change macro READ_PL_FROM_BIOS to READ_PL_FROM_FW. - Rectify drm_warn to drm_info. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Fixes: e90f7a58e659 ("drm/xe/hwmon: Add HWMON support for BMG") Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250529163458.2354509-2-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-05-12drm/xe: Add WA BB to capture active context utilizationUmesh Nerlige Ramappa
Context Timestamp (CTX_TIMESTAMP) in the LRC accumulates the run ticks of the context, but only gets updated when the context switches out. In order to check how long a context has been active before it switches out, two things are required: (1) Determine if the context is running: To do so, we program the WA BB to set an initial value for CTX_TIMESTAMP in the LRC. The value chosen is 1 since 0 is the initial value when the LRC is initialized. During a query, we just check for this value to determine if the context is active. If the context switched out, it would overwrite this location with the actual CTX_TIMESTAMP MMIO value. Note that WA BB runs as the last part of the context restore, so reusing this LRC location will not clobber anything. (2) Calculate the time that the context has been active for: The CTX_TIMESTAMP ticks only when the context is active. If a context is active, we just use the CTX_TIMESTAMP MMIO as the new value of utilization. While doing so, we need to read the CTX_TIMESTAMP MMIO for the specific engine instance. Since we do not know which instance the context is running on until it is scheduled, we also read the ENGINE_ID MMIO in the WA BB and store it in the PPHSWP. Using the above 2 instructions in a WA BB, capture active context utilization. v2: (Matt Brost) - This breaks TDR, fix it by saving the CTX_TIMESTAMP register "drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value" - Drop tile from LRC if using gt "drm/xe: Save the gt pointer in LRC and drop the tile" v3: - Remove helpers for bb_per_ctx_ptr (Matt) - Add define for context active value (Matt) - Use 64 bit CTX TIMESTAMP for platforms that support it. For platforms that don't, live with the rare race. (Matt, Lucas) - Convert engine id to hwe and get the MMIO value (Lucas) - Correct commit message on when WA BB runs (Lucas) v4: - s/GRAPHICS_VER(...)/xe->info.has_64bit_timestamp/ (Matt) - Drop support for active utilization on a VF (CI failure) - In xe_lrc_init ensure the lrc value is 0 to begin with (CI regression) v5: - Minor checkpatch fix - Squash into previous commit and make TDR use 32-bit time - Update code comment to match commit msg Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4532 Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250509161159.2173069-8-umesh.nerlige.ramappa@intel.com
2025-05-12drm/xe/xe2hpg: Add Wa_22021007897Aradhya Bhatia
Add Wa_22021007897 for the Xe2_HPG (graphics version: 20.01) IP. It is a permanent workaround, and applicable on all the steppings. Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com> Link: https://lore.kernel.org/r/20250512065004.2576-1-aradhya.bhatia@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-04-09drm/xe: remove unused LE_COSShuicheng Lin
The LE_COS definition missed passing the value parameter to REG_FIELD_PREP. This didn't cause build errors because the entire macro was unused. The value for this field is universally "0" for every MOCS entry on the old Xe_LP platforms, and the whole field has been removed from Xe_HP onward. Just delete the line so that we don't have an unused definition. Suggested-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250405171539.599850-1-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-04-04drm/xe/xe2hpg: Add Wa_16025250150Aradhya Bhatia
Add Wa_16025250150 for the Xe2_HPG (graphics version: 20.01) platforms. It is a permanent workaround, and applicable on all the steppings. Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250325134421.1489416-1-aradhya.bhatia@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>