Forums - What is causing freezing / SharedBufferStack waitForCondition on early Adreno 200 devices?

18 posts / 0 new
Last post
What is causing freezing / SharedBufferStack waitForCondition on early Adreno 200 devices?
Robert_Green
Join Date: 19 Oct 10
Posts: 5
Posted: Wed, 2012-04-04 16:01

We're struggling with this and it seems like it's much worse on our newer GLES 2.0 games than when we were doing 1.x development.  We now have 4 titles badly affected by this bug and more on the way without any way to workaround it without destroying framerate by putting glFinish() in before eglSwapBuffers.

Here's a bug filed on the Android bug tracker - http://code.google.com/p/android/issues/detail?id=20833

Could someone from the driver team please tell us what they fixed in later 200 and 205 drivers that was causing this issue so that we might have a chance of changing the way we do some things in GL to work around it??

For those of you who don't know, you can get pretty far into development and then start having the device freezing with logs that say:

10-18 19:55:23.890: WARN/SharedBufferStack(215): waitForCondition(LockCondition) timed out (identity=4, status=0). CPU may be pegged. trying again. 

And there is no user recovery except to pull the battery from the device, causing terrible reviews of your game which can have code that appears to be picture-perfect GL and memory management.

We are desperate for ANY information on this from Qualcomm/Adreno developers!  Please help!

Affected models of phone are:  Nexus One, EVO 4G and Incredible, for which there are many millions still in use.

We aren't asking for a fix to be pushed out to the devices but just for information leading to a feasible workaround that doesn't involve glFinish(), which wrecks the framerates for us on these devices.

Thank you!

  • Up0
  • Down0
ivan_starkov
Join Date: 10 Apr 12
Posts: 7
Posted: Tue, 2012-04-10 04:21

This bug occures on all adreno 200 series gpu android devices
glFinish does not help but reduces probability of this bug occurence.
Dear Qualcomm please type "Cpu may be pegged" in google search, 
almost  all opengl developers see your? driver bug.
It makes developing process for devices with you hardware very very bad.

Please answer what can cause this bug. 

  • Up0
  • Down0
Sergey_Kirpichenko
Join Date: 10 Apr 12
Posts: 3
Posted: Tue, 2012-04-10 05:36

We have faced the same problem recently. It seems it occurs on every 200 series GPU and glFinish does not help as it noted above.

We would greatly appreciate any info on the nature of this problem. It really prevents us from going ahead with our development. The half-year development results are at risk. Please help!

  • Up0
  • Down0
Mark_Feldman Moderator
Join Date: 4 Jan 12
Posts: 58
Posted: Wed, 2012-04-11 10:15

Could either of you comment on whether your apps are using a standard GLSurfaceView, or a custom SurfaceView?  Also which Android OS is loaded on the devices?

 

thanks,

mark

  • Up0
  • Down0
ivan_starkov
Join Date: 10 Apr 12
Posts: 7
Posted: Wed, 2012-04-11 10:50

Affected android versions with open gl es 2.0 support:

So without jni it's a 2.2 version of android and all above. Also affected Honeycomb tablets http://code.google.com/p/android/issues/detail?id=7432 (comment 13)

Hang occurs inside eglSwapBuffers so does not matter what SurfaceView we use, i see this bug in  standart GLSurfaceView, and also custom versions of SurfaceView.

 glFlush, glFinish DOES NOT solve the problem - the time you wait this bug increased high, but bug persists. 
Our last app can work more than 30 hours before we see this bug, bat it can occur also and after 5 minutes of app running.

 

 



 

  • Up0
  • Down0
ivan_starkov
Join Date: 10 Apr 12
Posts: 7
Posted: Wed, 2012-04-11 21:49

It seems that final freeze occures in SharedBufferBase::waitForCondition at surfaceflinger_client\SharedBufferStack.cpp

this code block:

 

 while ((condition()==false) &&   (stack.identity == identity) &&  (stack.status == NO_ERROR)) {       
       status_t err = client.cv.waitRelative(client.lock, TIMEOUT);

it seems that err always eq TIMED_OUT and condition() eq false (condition here is a Lock Condition) as a result we get "CPU may be Pegged"
So you need to find who set this lock in opengl pipeline and why this lock can't be unlocked -  deadlock etc...

 

also i've never seen this bug if i use one and only one program object

glUseProgram(oneP);
bla bla bla 

and probability to see bug increased if program objects  more than one

 

  • Up0
  • Down0
Naseer_Ahmed
Join Date: 14 Nov 10
Posts: 3
Posted: Thu, 2012-04-12 08:46

If your phone is rooted, a stack trace (using gdb) of all the threads of  the system_server process will help in narrowing down this issue.

  • Up0
  • Down0
Naseer_Ahmed
Join Date: 14 Nov 10
Posts: 3
Posted: Thu, 2012-04-12 08:46

If your phone is rooted, a stack trace (using gdb) of all the threads of  the system_server process will help in narrowing down this issue.

  • Up0
  • Down0
Robert_Green
Join Date: 19 Oct 10
Posts: 5
Posted: Thu, 2012-04-12 14:09

Mark - I use stock GLSurfaceView.

The bug is the driver freezing during swap buffers, not something in java land.  The symptom and red herring is the waitForCondition log - that's just showing that something acquired a lock and never returned, and that thing is the Adreno driver.  No need to debug the SharedBuffer code - it's actually working correctly.  We just need to know what was fixed after the early 200 driver branch so that we can figure out how to work around it and support all these older devices.  Your driver guys should definitely know about this - they fixed it in later 200 and 205 and on.

  • Up0
  • Down0
Timo_Heinapurola
Join Date: 18 Apr 12
Posts: 8
Posted: Wed, 2012-04-18 09:08

 

Hi,

We're also experiencing the same issue and it's keeping us from publishing our first title. Adding a call to glFinish after a call to eglSwapBuffers has worked for us but we can't use that "solution" as it needlesly drops the frame rate by 10-20 frames per second.

We're using native activities and OpenGL context handling happens on a thread other than where ALooper_pollAll is called, which we use for reading in events. I've tried synchronizing with the event handling thread by waiting for all rendering to finish (including the call to eglSwapBuffers) before processing new messages but the symptoms are the same.

I'm testing on an HTC Desire with Android 2.3.3 (the official HTC ROM).

 

I also did some digging into the SurfaceFlinger process and here's the output if it helps:

 

 

DUMP OF SERVICE SurfaceFlinger:

 

Visible layers (count = 4)

 

+ Layer 0x570b88

 

      z=    21000, pos=(   0,   0), size=( 480, 800), needsBlending=0, needsDithering=0, invalidate=0, alpha=0xff, flags=0x00000001, tr=[1.00, 0.00][0.00, 1.00] pid=122

 

      name=com.android.internal.service.wallpaper.ImageWallpaper

 

      client=0x39a7c8, identity=8

 

      [ head= 1, available= 2, queued= 0 ] reallocMask=00000000, identity=8, status=0

 

      format= 4, [480x800:480] [480x800:480], freezeLock=0x0, bypass=0, dq-q-time=2899 us

 

  Region transparentRegion (this=0x570d24, count=1)

 

    [  0,   0,   0,   0]

 

  Region transparentRegionScreen (this=0x570bc4, count=1)

 

    [  0,   0,   0,   0]

 

  Region visibleRegionScreen (this=0x570ba0, count=1)

 

    [  0,   0,   0,   0]

 

+ Layer 0xa84390

 

      z=    21010, pos=(   0,   0), size=( 480, 800), needsBlending=0, needsDithering=0, invalidate=0, alpha=0xff, flags=0x00000000, tr=[1.00, 0.00][0.00, 1.00] pid=727

 

      name=com.ri.BubblingUp/android.app.NativeActivity

 

      client=0xae6a40, identity=14

 

      [ head= 0, available= 0, queued= 0 ] reallocMask=00000000, identity=14, status=0

 

      format= 4, [480x800:480] [480x800:480], freezeLock=0x0, bypass=0, dq-q-time=4289738567 us

 

  Region transparentRegion (this=0xa8452c, count=1)

 

    [  0,   0,   0,   0]

 

  Region transparentRegionScreen (this=0xa843cc, count=1)

 

    [  0,   0,   0,   0]

 

  Region visibleRegionScreen (this=0xa843a8, count=1)

 

    [  0,   0, 480, 800]

 

+ Layer 0x5bfb10

 

      z=    51005, pos=(   0,-800), size=( 480, 714), needsBlending=1, needsDithering=0, invalidate=0, alpha=0xff, flags=0x00000004, tr=[1.00, 0.00][0.00, 1.00] pid=195

 

      name=StatusBarExpanded

 

      client=0x45a918, identity=5

 

      [ head= 0, available= 2, queued= 0 ] reallocMask=40000000, identity=5, status=0

 

      format= 1, [480x714:480] [  0x  0:  0], freezeLock=0x0, bypass=0, dq-q-time=3662 us

 

  Region transparentRegion (this=0x5bfcac, count=1)

 

    [  0,   0,   0,   0]

 

  Region transparentRegionScreen (this=0x5bfb4c, count=1)

 

    [  0,   0,   0,   0]

 

  Region visibleRegionScreen (this=0x5bfb28, count=1)

 

    [  0,   0,   0,   0]

 

+ Layer 0x4a1910

 

      z=    81000, pos=(   0,   0), size=( 480,  38), needsBlending=0, needsDithering=0, invalidate=0, alpha=0x00, flags=0x00000001, tr=[1.00, 0.00][0.00, 1.00] pid=195

 

      name=StatusBar

 

      client=0x45a918, identity=4

 

      [ head= 0, available= 2, queued= 0 ] reallocMask=00000000, identity=4, status=0

 

      format= 2, [480x 38:480] [480x 38:480], freezeLock=0x0, bypass=0, dq-q-time=1068 us

 

  Region transparentRegion (this=0x4a1aac, count=1)

 

    [  0,   0,   0,   0]

 

  Region transparentRegionScreen (this=0x4a194c, count=1)

 

    [  0,   0,   0,   0]

 

  Region visibleRegionScreen (this=0x4a1928, count=1)

 

    [  0,   0,   0,   0]

 

Purgatory state (0 entries)

 

SurfaceFlinger global state

 

  Region WormholeRegion (this=0xb46cc, count=1)

 

    [  0,   0,   0,   0]

 

  display frozen: no, freezeCount=0, orientation=0, bypass=0x0, canDraw=1

 

  last eglSwapBuffers() time: 12786.865000 us

 

  last transaction time     : 30.517000 us

 

Allocated buffers:

 

  0x4a2c30:   71.25 KiB |  480 ( 480) x   38 |        2 | 0x00000133 |  195

 

  0x4e0dd8: 1500.00 KiB |  480 ( 480) x  800 |        1 | 0x00000303 |  727

 

  0x55d068:   71.25 KiB |  480 ( 480) x   38 |        2 | 0x00000133 |  195

 

  0x55d148: 1338.75 KiB |  480 ( 480) x  714 |        1 | 0x00000133 |  195

 

  0x572690:  750.00 KiB |  480 ( 480) x  800 |        4 | 0x00000133 |  122

 

  0x5f27f0:  750.00 KiB |  480 ( 480) x  800 |        4 | 0x00000133 |  122

 

  0xacde10: 1500.00 KiB |  480 ( 480) x  800 |        1 | 0x00000303 |  727

 

Total allocated: 5981.25 KB

 

 

  • Up0
  • Down0
Mark_Feldman Moderator
Join Date: 4 Jan 12
Posts: 58
Posted: Thu, 2012-04-19 08:06

Robert - Is it possible for you to point us to your apk which demostrates the waitForCondition error?  We need to be able to reproduce the problem here, to suggest changes...

 

 

  • Up0
  • Down0
Timo_Heinapurola
Join Date: 18 Apr 12
Posts: 8
Posted: Tue, 2012-04-24 09:21

Any progress on this on your part, Robert?

I've tried pretty much everything to no avail. I sent a freezing build of our game to Qualcomm and I'm hoping we could finally resolve this issue and walk into the sunset :) What a relief that would be...

-Timo

  • Up0
  • Down0
Robert_Green
Join Date: 19 Oct 10
Posts: 5
Posted: Tue, 2012-04-24 10:52

Hi Timo,

Yes actually we have (we think) solved it for our engine.  I thought I had posted this message already but it looks like I didn't submit after the preview.  

We were profiling and optimizing a newer game and were shifting some of our rendering code around a little to reduce redundant state changes when we saw that we had a ton of shader binds and unbinds and realized that we hadn't cached the shader binding between different entity draws, so we went ahead and put that caching in, reducing our total shader binds down to about 7 per frame.  Then we saw all of the glBindProgram(0) calls and glDisableVertexAttribArray(n) calls we had in there between binding changes and thought maybe we could remove those altogether and after some testing, we found that they were totally unnecessary.  We then also made it so that only the debug version of the game would have glGetError() so with all of those things combined, we cut the number of gl calls per frame in half or less.  As a side effect, the problem we were having on the Adreno 200 seemingly went away.  We didn't change our shaders or anything, just removed any unnecessary gl and my last longevity test had the game run until the battery died on the EVO 4G.

So - If your app is desktop/mobile using something like BatteryTech that we use, I recommend running it in gDEBugger on the desktop build to remove everything unnecessary and then if its stable on the 200, also work on it in the Adreno profiler.  You could also try adding glFinish() so you can use just the Adreno profiler but I like to use a number of different profilers because they all give you a little something different.

  • Up0
  • Down0
Timo_Heinapurola
Join Date: 18 Apr 12
Posts: 8
Posted: Mon, 2012-04-30 15:48

Hi Robert,

Great to hear that!

I found this "discussion" a while ago: http://forum.unity3d.com/threads/116353-Recieving-CPU-Pegged-Error-Followed-by-Audio-Flinger-Output. The post got me thinking about our shaders so I decided to take a look at them. At first I took a look at our skinning shader that unpacked bone weights and indices, nothing very special. I removed that code for testing purposes and I have not been able to reproduce the issue even with hours of gameplay. I have this theory that the GPU might have actually crashed, which in turn crashed the rendering thread, which in turn hung the android event calls, which in turn led to the phone not automatically booting and the hang of the GPU led to the screen not updating even though the application was running in the background.

For now I simply hope my fix is enough. It's not a very good one as it means we can't use skinning in our shaders at the current time. Waiting for Qualcomm to say something about the shader as to what could be the issue...

The game is now published and available on Google Play: https://play.google.com/store/apps/details?id=com.ri.BubblingUp.

-Timo

  • Up0
  • Down0
Robert_Green
Join Date: 19 Oct 10
Posts: 5
Posted: Mon, 2012-04-30 15:52

Hi Timo,

I need to correct my previous post.  The problem seemingly went away on one device (the worst crasher) and got worse on one of the not so bad crashers... Which is strange.  We use a loop in the shaders to calculate multiple point lights, and my suspicion is that it's that loop that is causing an issue.

We have in fact tested GPU skinning with 31 bones without any issues though, so it's certainly possible.  I'd be happy to share code and in fact, perhaps we should analyze each others code because if yours causes crashing and mine doesn't, maybe there's something to this.  Please email me rgreen at battery powered games dot com and we can figure this out a little more.

  • Up0
  • Down0
sampsa.lehtonen
Join Date: 23 Jan 13
Posts: 3
Posted: Wed, 2013-01-23 07:31
I'm working on a game that suffers from same issue. Game is develop using Unity3D. Game crashes on at least two Adreno 205 based phones (Sony Experia Arc S and HTC Desire S). Runs perfectly on Adreno 220 (Sony Experia S) If Qualcomm wants to look into this, I'd be happy to provide an APK. Contact me at sampsa (at) recoilgames.com Cheers, Sampsa / Recoil Games
  • Up0
  • Down0
sampsa.lehtonen
Join Date: 23 Jan 13
Posts: 3
Posted: Thu, 2013-02-07 04:24
Just a heads up. We got a crash on Sony Xperia Play (A205), too. So far it has crashed on every A205 device. We also tested on Sony Xperia Miro (A200) and no crashes there.
  • Up0
  • Down0
sampsa.lehtonen
Join Date: 23 Jan 13
Posts: 3
Posted: Tue, 2013-02-12 00:24

It seems that the problem was related to a shader that contained clip/discard instruction. Not using those shaders not crash the game at all on A205.

  • Up0
  • Down0
or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.