<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c, branch v5.4.233</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.233</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.233'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-07-19T06:53:07Z</updated>
<entry>
<title>drm/amd/amdgpu/sriov disable all ip hw status by default</title>
<updated>2021-07-19T06:53:07Z</updated>
<author>
<name>Jack Zhang</name>
<email>Jack.Zhang1@amd.com</email>
</author>
<published>2021-04-27T09:08:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=997dedaa75e9ee5a5a3ebff74d458a297ade2162'/>
<id>urn:sha1:997dedaa75e9ee5a5a3ebff74d458a297ade2162</id>
<content type='text'>
[ Upstream commit 95ea3dbc4e9548d35ab6fbf67675cef8c293e2f5 ]

Disable all ip's hw status to false before any hw_init.
Only set it to true until its hw_init is executed.

The old 5.9 branch has this change but somehow the 5.11 kernrel does
not have this fix.

Without this change, sriov tdr have gfx IB test fail.

Signed-off-by: Jack Zhang &lt;Jack.Zhang1@amd.com&gt;
Review-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amd/amdgpu: fix a potential deadlock in gpu reset</title>
<updated>2021-06-03T06:59:12Z</updated>
<author>
<name>Lang Yu</name>
<email>Lang.Yu@amd.com</email>
</author>
<published>2021-05-17T04:47:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=de2bf5de17be2b552d18ddea9a16b233f4f009e3'/>
<id>urn:sha1:de2bf5de17be2b552d18ddea9a16b233f4f009e3</id>
<content type='text'>
[ Upstream commit 9c2876d56f1ce9b6b2072f1446fb1e8d1532cb3d ]

When amdgpu_ib_ring_tests failed, the reset logic called
amdgpu_device_ip_suspend twice, then deadlock occurred.
Deadlock log:

[  805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
[  806.290952] [drm] free PSP TMR buffer

[  806.319406] ============================================
[  806.320315] WARNING: possible recursive locking detected
[  806.321225] 5.11.0-custom #1 Tainted: G        W  OEL
[  806.322135] --------------------------------------------
[  806.323043] cat/2593 is trying to acquire lock:
[  806.323825] ffff888136b1cdc8 (&amp;adev-&gt;dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.325668]
               but task is already holding lock:
[  806.326664] ffff888136b1cdc8 (&amp;adev-&gt;dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.328430]
               other info that might help us debug this:
[  806.329539]  Possible unsafe locking scenario:

[  806.330549]        CPU0
[  806.330983]        ----
[  806.331416]   lock(&amp;adev-&gt;dm.dc_lock);
[  806.332086]   lock(&amp;adev-&gt;dm.dc_lock);
[  806.332738]
                *** DEADLOCK ***

[  806.333747]  May be due to missing lock nesting notation

[  806.334899] 3 locks held by cat/2593:
[  806.335537]  #0: ffff888100d3f1b8 (&amp;attr-&gt;mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110
[  806.337009]  #1: ffff888136b1fd78 (&amp;adev-&gt;reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
[  806.339018]  #2: ffff888136b1cdc8 (&amp;adev-&gt;dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.340869]
               stack backtrace:
[  806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G        W  OEL    5.11.0-custom #1
[  806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020
[  806.344413] Call Trace:
[  806.344849]  dump_stack+0x93/0xbd
[  806.345435]  __lock_acquire.cold+0x18a/0x2cf
[  806.346179]  lock_acquire+0xca/0x390
[  806.346807]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.347813]  __mutex_lock+0x9b/0x930
[  806.348454]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.349434]  ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu]
[  806.350581]  ? _raw_spin_unlock_irqrestore+0x47/0x50
[  806.351437]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.352437]  ? rcu_read_lock_sched_held+0x4f/0x80
[  806.353252]  ? rcu_read_lock_sched_held+0x4f/0x80
[  806.354064]  mutex_lock_nested+0x1b/0x20
[  806.354747]  ? mutex_lock_nested+0x1b/0x20
[  806.355457]  dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.356427]  ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu]
[  806.357736]  amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu]
[  806.360394]  amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
[  806.362926]  amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu]
[  806.365560]  amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu]

Signed-off-by: Lang Yu &lt;Lang.Yu@amd.com&gt;
Acked-by: Christian KÃnig &lt;christian.koenig@amd.com&gt;
Reviewed-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix a GPU hang issue when remove device</title>
<updated>2021-01-19T17:26:16Z</updated>
<author>
<name>Dennis Li</name>
<email>Dennis.Li@amd.com</email>
</author>
<published>2020-12-30T11:45:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=644baa95db2b27d4703f123d040c1d83ee058a83'/>
<id>urn:sha1:644baa95db2b27d4703f123d040c1d83ee058a83</id>
<content type='text'>
[ Upstream commit 88e21af1b3f887d217f2fb14fc7e7d3cd87ebf57 ]

When GFXOFF is enabled and GPU is idle, driver will fail to access some
registers. Therefore change to disable power gating before all access
registers with MMIO.

Dmesg log is as following:
amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
amdgpu: cp queue pipe 4 queue 0 preemption failed
amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706

Signed-off-by: Dennis Li &lt;Dennis.Li@amd.com&gt;
Reviewed-by: Hawking Zhang &lt;Hawking.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: correct the gpu reset handling for job != NULL case</title>
<updated>2020-11-05T10:43:33Z</updated>
<author>
<name>Evan Quan</name>
<email>evan.quan@amd.com</email>
</author>
<published>2020-10-15T06:57:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d417026c408187cf676d3d2d5920fcdc5eafd079'/>
<id>urn:sha1:d417026c408187cf676d3d2d5920fcdc5eafd079</id>
<content type='text'>
commit 207ac684792560acdb9e06f9d707ebf63c84b0e0 upstream.

Current code wrongly treat all cases as job == NULL.

Signed-off-by: Evan Quan &lt;evan.quan@amd.com&gt;
Reviewed-and-tested-by: Jane Jian &lt;Jane.Jian@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset</title>
<updated>2020-10-01T11:17:54Z</updated>
<author>
<name>Jack Zhang</name>
<email>Jack.Zhang1@amd.com</email>
</author>
<published>2020-04-02T07:10:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cb1dfad688ef9aa7140933a5fe7ffe49b2ee5c48'/>
<id>urn:sha1:cb1dfad688ef9aa7140933a5fe7ffe49b2ee5c48</id>
<content type='text'>
[ Upstream commit 04bef61e5da18c2b301c629a209ccdba4d4c6fbb ]

kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate

Without this change, sriov tdr code path will never free those allocated
memories and get memory leak.

v2:add a bugfix for kiq ring test fail

Signed-off-by: Jack Zhang &lt;Jack.Zhang1@amd.com&gt;
Reviewed-by: Monk Liu &lt;monk.liu@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: drop redundant cg/pg ungate on runpm enter</title>
<updated>2020-05-14T05:58:19Z</updated>
<author>
<name>Evan Quan</name>
<email>evan.quan@amd.com</email>
</author>
<published>2020-04-24T07:36:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f4d20b01eaf68ca1af79532a731e48fdd3352bf9'/>
<id>urn:sha1:f4d20b01eaf68ca1af79532a731e48fdd3352bf9</id>
<content type='text'>
[ Upstream commit f7b52890daba570bc8162d43c96b5583bbdd4edd ]

CG/PG ungate is already performed in ip_suspend_phase1. Otherwise,
the CG/PG ungate will be performed twice. That will cause gfxoff
disablement is performed twice also on runpm enter while gfxoff
enablemnt once on rump exit. That will put gfxoff into disabled
state.

Fixes: b2a7e9735ab286 ("drm/amdgpu: fix the hw hang during perform system reboot and reset")
Signed-off-by: Evan Quan &lt;evan.quan@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: move kfd suspend after ip_suspend_phase1</title>
<updated>2020-05-14T05:58:19Z</updated>
<author>
<name>Evan Quan</name>
<email>evan.quan@amd.com</email>
</author>
<published>2020-04-24T07:32:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c973b108912a1a8889c9835ef97ec1480d3acd17'/>
<id>urn:sha1:c973b108912a1a8889c9835ef97ec1480d3acd17</id>
<content type='text'>
[ Upstream commit c457a273e118bb96e1db8d1825f313e6cafe4258 ]

This sequence change should be safe as what did in ip_suspend_phase1
is to suspend DCE only. And this is a prerequisite for coming
redundant cg/pg ungate dropping.

Fixes: 487eca11a321ef ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)")
Signed-off-by: Evan Quan &lt;evan.quan@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix the hw hang during perform system reboot and reset</title>
<updated>2020-04-21T07:05:02Z</updated>
<author>
<name>Prike Liang</name>
<email>Prike.Liang@amd.com</email>
</author>
<published>2020-04-13T13:41:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e2b80bf26956f7b8e2271bc772a28a75d9978a4d'/>
<id>urn:sha1:e2b80bf26956f7b8e2271bc772a28a75d9978a4d</id>
<content type='text'>
commit b2a7e9735ab2864330be9d00d7f38c961c28de5d upstream.

The system reboot failed as some IP blocks enter power gate before perform
hw resource destory. Meanwhile use unify interface to set device CGPG to ungate
state can simplify the amdgpu poweroff or reset ungate guard.

Fixes: 487eca11a321ef ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)")
Signed-off-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Tested-by: Mengbing Wang &lt;Mengbing.Wang@amd.com&gt;
Tested-by: Paul Menzel &lt;pmenzel@molgen.mpg.de&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>drm/amdgpu: fix gfx hang during suspend with video playback (v2)</title>
<updated>2020-04-17T08:50:25Z</updated>
<author>
<name>Prike Liang</name>
<email>Prike.Liang@amd.com</email>
</author>
<published>2020-04-07T12:21:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bdac1d76a310578b3ad081df2db97da59e241010'/>
<id>urn:sha1:bdac1d76a310578b3ad081df2db97da59e241010</id>
<content type='text'>
[ Upstream commit 487eca11a321ef33bcf4ca5adb3c0c4954db1b58 ]

The system will be hang up during S3 suspend because of SMU is pending
for GC not respose the register CP_HQD_ACTIVE access request.This issue
root cause of accessing the GC register under enter GFX CGGPG and can
be fixed by disable GFX CGPG before perform suspend.

v2: Use disable the GFX CGPG instead of RLC safe mode guard.

Signed-off-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Tested-by: Mengbing Wang &lt;Mengbing.Wang@amd.com&gt;
Reviewed-by: Huang Rui &lt;ray.huang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV</title>
<updated>2020-02-24T07:36:38Z</updated>
<author>
<name>Monk Liu</name>
<email>Monk.Liu@amd.com</email>
</author>
<published>2019-12-17T10:16:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=43ef7ad610dc30996ff26436a9652dbf35d37d51'/>
<id>urn:sha1:43ef7ad610dc30996ff26436a9652dbf35d37d51</id>
<content type='text'>
[ Upstream commit 5a7489a7e189ee2be889485f90c8cf24ea4b9a40 ]

issues:
MEC is ruined by the amdkfd_pre_reset after VF FLR done

fix:
amdkfd_pre_reset() would ruin MEC after hypervisor finished the VF FLR,
the correct sequence is do amdkfd_pre_reset before VF FLR but there is
a limitation to block this sequence:
if we do pre_reset() before VF FLR, it would go KIQ way to do register
access and stuck there, because KIQ probably won't work by that time
(e.g. you already made GFX hang)

so the best way right now is to simply remove it.

Signed-off-by: Monk Liu &lt;Monk.Liu@amd.com&gt;
Reviewed-by: Emily Deng &lt;Emily.Deng@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
