Forums - GPU Crash when running in temp range 9C to ~12C

1 post / 0 new
GPU Crash when running in temp range 9C to ~12C
kim.steiner
Join Date: 16 Jan 18
Posts: 2
Posted: Thu, 2022-02-10 06:23

Hi.  We have a Dragonboard820c running 4.14 kernel version.  We monitor the temperature of the snapdragon 820 in /sys/class/thermal/thermal_zone0/temp.  If the temperature of the chip is in the range of 7C to ~12C, sometimes the board with crash.  The dmesg log shows a lot of these messages:

[email protected]-alip:/sys/class/thermal# cat thermal_zone0/temp
7300
[email protected]-alip:/sys/class/thermal# cat thermal_zone0/temp
7000
[email protected]-alip:/sys/class/thermal# [  295.414124] msm 900000.mdss: gpu fault ring 1 fence 13b85 status C0040101 rb 1aad/1aad ib1 00000000034CA000/0000 ib2 0000000003274000/0000
[  295.414181] msm 900000.mdss: A530: hangcheck recover!
[  295.430749] msm 900000.mdss: gpu fault ring 1 fence 13b88 status C00401C3 rb 1b10/1b1c ib1 0000000003876000/0000 ib2 0000000003274000/0000
[  295.430809] msm 900000.mdss: A530: hangcheck recover!
[  295.442982] msm 900000.mdss: A530: offending task: X:flush_queue0 (/usr/lib/xorg/Xorg -nolisten tcp -auth /var/run/sddm/{7f7de7ac-f4b2-4c4d-a384-ee1fe3330703} -background none -noreset -displayfd 17 -seat seat0 vt7)
[  295.448138] revision: 530 (5.3.0.2)
[  295.466960] rb 0: fence:    0/0
[  295.470400] rptr:     39
[  295.473553] rb wptr:  39
[  295.476325] rb 1: fence:    80773/80776
[  295.478820] rptr:     6848
[  295.482403] rb wptr:  6940
[  295.485183] rb 2: fence:    0/0
[  295.487871] rptr:     0
[  295.490883] rb wptr:  0
[  295.493340] rb 3: fence:    0/0
[  295.495768] rptr:     0
[  295.498870] rb wptr:  0
[  295.501329] CP_SCRATCH_REG0: 0
[  295.503758] CP_SCRATCH_REG1: 0
[  295.506859] CP_SCRATCH_REG2: 80773
[  295.509925] CP_SCRATCH_REG3: 0
[  295.513305] CP_SCRATCH_REG4: 0
[  295.516346] CP_SCRATCH_REG5: 487267
[  295.519382] CP_SCRATCH_REG6: 487276
[  295.522746] CP_SCRATCH_REG7: 487284
[  295.604584] msm 900000.mdss: gpu fault ring 1 fence 13ba4 status C0040101 rb 0456/0456 ib1 0000000003AF2000/0000 ib2 0000000003AF3000/0000
[  295.604639] msm 900000.mdss: A530: hangcheck recover!
 
Our procedure to see this "gpu fault ring" message is to start running the glxgears application while the dragonboard820c is at room temperature in a temp chamber.  Lower the chamber temperature until the temperature read in  /sys/class/thermal/thermal_zone0/temp is in the critical temperature zone (7Cto 12C).  We slowly lower the temperature through the full range until we see the above dmesg.  We notice that the glxgears application is much more choppy and sometimes crashes.  Has anyone seen anything like this on the Qualcomm Flight Pro?
 
Thanks,
Kim
  • Up0
  • Down0

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.