We're struggling with this and it seems like it's much worse on our newer GLES 2.0 games than when we were doing 1.x development. We now have 4 titles badly affected by this bug and more on the way without any way to workaround it without destroying framerate by putting glFinish() in before eglSwapBuffers.
Here's a bug filed on the Android bug tracker - http://code.google.com/p/android/issues/detail?id=20833
Could someone from the driver team please tell us what they fixed in later 200 and 205 drivers that was causing this issue so that we might have a chance of changing the way we do some things in GL to work around it??
For those of you who don't know, you can get pretty far into development and then start having the device freezing with logs that say:
10-18 19:55:23.890: WARN/SharedBufferStack(215): waitForCondition(LockCondition) timed out (identity=4, status=0). CPU may be pegged. trying again.
And there is no user recovery except to pull the battery from the device, causing terrible reviews of your game which can have code that appears to be picture-perfect GL and memory management.
We are desperate for ANY information on this from Qualcomm/Adreno developers! Please help!
Affected models of phone are: Nexus One, EVO 4G and Incredible, for which there are many millions still in use.
We aren't asking for a fix to be pushed out to the devices but just for information leading to a feasible workaround that doesn't involve glFinish(), which wrecks the framerates for us on these devices.
Thank you!
This bug occures on all adreno 200 series gpu android devices
glFinish does not help but reduces probability of this bug occurence.
Dear Qualcomm please type "Cpu may be pegged" in google search,
almost all opengl developers see your? driver bug.
It makes developing process for devices with you hardware very very bad.
Please answer what can cause this bug.
We have faced the same problem recently. It seems it occurs on every 200 series GPU and glFinish does not help as it noted above.
We would greatly appreciate any info on the nature of this problem. It really prevents us from going ahead with our development. The half-year development results are at risk. Please help!
Could either of you comment on whether your apps are using a standard GLSurfaceView, or a custom SurfaceView? Also which Android OS is loaded on the devices?
thanks,
mark
Affected android versions with open gl es 2.0 support:
So without jni it's a 2.2 version of android and all above. Also affected Honeycomb tablets http://code.google.com/p/android/issues/detail?id=7432 (comment 13)
Hang occurs inside eglSwapBuffers so does not matter what SurfaceView we use, i see this bug in standart GLSurfaceView, and also custom versions of SurfaceView.
glFlush, glFinish DOES NOT solve the problem - the time you wait this bug increased high, but bug persists.
Our last app can work more than 30 hours before we see this bug, bat it can occur also and after 5 minutes of app running.
It seems that final freeze occures in SharedBufferBase::waitForCondition at surfaceflinger_client\SharedBufferStack.cpp
this code block:
while ((condition()==false) && (stack.identity == identity) && (stack.status == NO_ERROR)) {
status_t err = client.cv.waitRelative(client.lock, TIMEOUT);
also i've never seen this bug if i use one and only one program object
glUseProgram(oneP);
bla bla bla
and probability to see bug increased if program objects more than one
If your phone is rooted, a stack trace (using gdb) of all the threads of the system_server process will help in narrowing down this issue.
If your phone is rooted, a stack trace (using gdb) of all the threads of the system_server process will help in narrowing down this issue.
Mark - I use stock GLSurfaceView.
The bug is the driver freezing during swap buffers, not something in java land. The symptom and red herring is the waitForCondition log - that's just showing that something acquired a lock and never returned, and that thing is the Adreno driver. No need to debug the SharedBuffer code - it's actually working correctly. We just need to know what was fixed after the early 200 driver branch so that we can figure out how to work around it and support all these older devices. Your driver guys should definitely know about this - they fixed it in later 200 and 205 and on.
Hi,
We're also experiencing the same issue and it's keeping us from publishing our first title. Adding a call to glFinish after a call to eglSwapBuffers has worked for us but we can't use that "solution" as it needlesly drops the frame rate by 10-20 frames per second.
We're using native activities and OpenGL context handling happens on a thread other than where ALooper_pollAll is called, which we use for reading in events. I've tried synchronizing with the event handling thread by waiting for all rendering to finish (including the call to eglSwapBuffers) before processing new messages but the symptoms are the same.
I'm testing on an HTC Desire with Android 2.3.3 (the official HTC ROM).
I also did some digging into the SurfaceFlinger process and here's the output if it helps:
DUMP OF SERVICE SurfaceFlinger:
Visible layers (count = 4)
+ Layer 0x570b88
z= 21000, pos=( 0, 0), size=( 480, 800), needsBlending=0, needsDithering=0, invalidate=0, alpha=0xff, flags=0x00000001, tr=[1.00, 0.00][0.00, 1.00] pid=122
name=com.android.internal.service.wallpaper.ImageWallpaper
client=0x39a7c8, identity=8
[ head= 1, available= 2, queued= 0 ] reallocMask=00000000, identity=8, status=0
format= 4, [480x800:480] [480x800:480], freezeLock=0x0, bypass=0, dq-q-time=2899 us
Region transparentRegion (this=0x570d24, count=1)
[ 0, 0, 0, 0]
Region transparentRegionScreen (this=0x570bc4, count=1)
[ 0, 0, 0, 0]
Region visibleRegionScreen (this=0x570ba0, count=1)
[ 0, 0, 0, 0]
+ Layer 0xa84390
z= 21010, pos=( 0, 0), size=( 480, 800), needsBlending=0, needsDithering=0, invalidate=0, alpha=0xff, flags=0x00000000, tr=[1.00, 0.00][0.00, 1.00] pid=727
name=com.ri.BubblingUp/android.app.NativeActivity
client=0xae6a40, identity=14
[ head= 0, available= 0, queued= 0 ] reallocMask=00000000, identity=14, status=0
format= 4, [480x800:480] [480x800:480], freezeLock=0x0, bypass=0, dq-q-time=4289738567 us
Region transparentRegion (this=0xa8452c, count=1)
[ 0, 0, 0, 0]
Region transparentRegionScreen (this=0xa843cc, count=1)
[ 0, 0, 0, 0]
Region visibleRegionScreen (this=0xa843a8, count=1)
[ 0, 0, 480, 800]
+ Layer 0x5bfb10
z= 51005, pos=( 0,-800), size=( 480, 714), needsBlending=1, needsDithering=0, invalidate=0, alpha=0xff, flags=0x00000004, tr=[1.00, 0.00][0.00, 1.00] pid=195
name=StatusBarExpanded
client=0x45a918, identity=5
[ head= 0, available= 2, queued= 0 ] reallocMask=40000000, identity=5, status=0
format= 1, [480x714:480] [ 0x 0: 0], freezeLock=0x0, bypass=0, dq-q-time=3662 us
Region transparentRegion (this=0x5bfcac, count=1)
[ 0, 0, 0, 0]
Region transparentRegionScreen (this=0x5bfb4c, count=1)
[ 0, 0, 0, 0]
Region visibleRegionScreen (this=0x5bfb28, count=1)
[ 0, 0, 0, 0]
+ Layer 0x4a1910
z= 81000, pos=( 0, 0), size=( 480, 38), needsBlending=0, needsDithering=0, invalidate=0, alpha=0x00, flags=0x00000001, tr=[1.00, 0.00][0.00, 1.00] pid=195
name=StatusBar
client=0x45a918, identity=4
[ head= 0, available= 2, queued= 0 ] reallocMask=00000000, identity=4, status=0
format= 2, [480x 38:480] [480x 38:480], freezeLock=0x0, bypass=0, dq-q-time=1068 us
Region transparentRegion (this=0x4a1aac, count=1)
[ 0, 0, 0, 0]
Region transparentRegionScreen (this=0x4a194c, count=1)
[ 0, 0, 0, 0]
Region visibleRegionScreen (this=0x4a1928, count=1)
[ 0, 0, 0, 0]
Purgatory state (0 entries)
SurfaceFlinger global state
Region WormholeRegion (this=0xb46cc, count=1)
[ 0, 0, 0, 0]
display frozen: no, freezeCount=0, orientation=0, bypass=0x0, canDraw=1
last eglSwapBuffers() time: 12786.865000 us
last transaction time : 30.517000 us
Allocated buffers:
0x4a2c30: 71.25 KiB | 480 ( 480) x 38 | 2 | 0x00000133 | 195
0x4e0dd8: 1500.00 KiB | 480 ( 480) x 800 | 1 | 0x00000303 | 727
0x55d068: 71.25 KiB | 480 ( 480) x 38 | 2 | 0x00000133 | 195
0x55d148: 1338.75 KiB | 480 ( 480) x 714 | 1 | 0x00000133 | 195
0x572690: 750.00 KiB | 480 ( 480) x 800 | 4 | 0x00000133 | 122
0x5f27f0: 750.00 KiB | 480 ( 480) x 800 | 4 | 0x00000133 | 122
0xacde10: 1500.00 KiB | 480 ( 480) x 800 | 1 | 0x00000303 | 727
Total allocated: 5981.25 KB
Robert - Is it possible for you to point us to your apk which demostrates the waitForCondition error? We need to be able to reproduce the problem here, to suggest changes...
Any progress on this on your part, Robert?
I've tried pretty much everything to no avail. I sent a freezing build of our game to Qualcomm and I'm hoping we could finally resolve this issue and walk into the sunset :) What a relief that would be...
-Timo
Hi Timo,
Yes actually we have (we think) solved it for our engine. I thought I had posted this message already but it looks like I didn't submit after the preview.
We were profiling and optimizing a newer game and were shifting some of our rendering code around a little to reduce redundant state changes when we saw that we had a ton of shader binds and unbinds and realized that we hadn't cached the shader binding between different entity draws, so we went ahead and put that caching in, reducing our total shader binds down to about 7 per frame. Then we saw all of the glBindProgram(0) calls and glDisableVertexAttribArray(n) calls we had in there between binding changes and thought maybe we could remove those altogether and after some testing, we found that they were totally unnecessary. We then also made it so that only the debug version of the game would have glGetError() so with all of those things combined, we cut the number of gl calls per frame in half or less. As a side effect, the problem we were having on the Adreno 200 seemingly went away. We didn't change our shaders or anything, just removed any unnecessary gl and my last longevity test had the game run until the battery died on the EVO 4G.
So - If your app is desktop/mobile using something like BatteryTech that we use, I recommend running it in gDEBugger on the desktop build to remove everything unnecessary and then if its stable on the 200, also work on it in the Adreno profiler. You could also try adding glFinish() so you can use just the Adreno profiler but I like to use a number of different profilers because they all give you a little something different.
Hi Robert,
Great to hear that!
I found this "discussion" a while ago: http://forum.unity3d.com/threads/116353-Recieving-CPU-Pegged-Error-Followed-by-Audio-Flinger-Output. The post got me thinking about our shaders so I decided to take a look at them. At first I took a look at our skinning shader that unpacked bone weights and indices, nothing very special. I removed that code for testing purposes and I have not been able to reproduce the issue even with hours of gameplay. I have this theory that the GPU might have actually crashed, which in turn crashed the rendering thread, which in turn hung the android event calls, which in turn led to the phone not automatically booting and the hang of the GPU led to the screen not updating even though the application was running in the background.
For now I simply hope my fix is enough. It's not a very good one as it means we can't use skinning in our shaders at the current time. Waiting for Qualcomm to say something about the shader as to what could be the issue...
The game is now published and available on Google Play: https://play.google.com/store/apps/details?id=com.ri.BubblingUp.
-Timo
Hi Timo,
I need to correct my previous post. The problem seemingly went away on one device (the worst crasher) and got worse on one of the not so bad crashers... Which is strange. We use a loop in the shaders to calculate multiple point lights, and my suspicion is that it's that loop that is causing an issue.
We have in fact tested GPU skinning with 31 bones without any issues though, so it's certainly possible. I'd be happy to share code and in fact, perhaps we should analyze each others code because if yours causes crashing and mine doesn't, maybe there's something to this. Please email me rgreen at battery powered games dot com and we can figure this out a little more.
It seems that the problem was related to a shader that contained clip/discard instruction. Not using those shaders not crash the game at all on A205.