Forums - dynamic branching in GLES 3.0 vertex shader

8 posts / 0 new
Last post
dynamic branching in GLES 3.0 vertex shader
crazii
Join Date: 8 Sep 14
Posts: 13
Posted: Tue, 2015-03-24 07:00

Hi there, I've encountered a vertex shader problem on Adreno 330, tested on Nexus 5 and Kindle Fire HDX.

my vertex shader is as follows: uniform highp vec4 bone_dq[220];
#define FETCH_DQ(i) dq.r = bone_dq[(i)*2]; dq.t = bone_dq[(i)*2+1];
void skin_vertex_weight4(inout vec3 v, inout vec3 tbn_quat, ivec4 bones, lowp vec4 weights)
{
DQ dq;
FETCH_DQ( bones[0] ];
vec4 r0 = dq.r;
DQ finalDQ;
finalDQ.r = weights[0] * dq.r;
finalDQ.t = weights[0] * dq.t;
for(int i = 1; i < 4; ++i)
{
FETCH_DQ( bones[i] );
if( dot( r0, dq.r) < 0 )
{
// seems that it never entered branch
finalDQ.r -= weights[i] * dq.r;
finalDQ.t -= weights[i] * dq.t;
}
else
{
finalDQ.r += weights[i] * dq.r;
finalDQ.r += weights[i] * dq.r;
}
}
...
}

I tried to use math method to kill the branch, but it doesn't work too.
unroll the for loop doesn't help too. float shortPath = sign(sign(dot(r0, dq.r))+0.5);
finalDQ.r += shortPath*weights[i] * dq.r;
finalDQ.t += shortPath*weights[i] * dq.t;

BTW, the same code runs good on other GPU, i.e. ARM Mali Txx. And I don't see any extra notes on GLES 3.0 GLSL specification on dynamic branching.

Is that a bug or can you give any help? Thanks
:)

  • Up0
  • Down0
crazii
Join Date: 8 Sep 14
Posts: 13
Posted: Tue, 2015-03-24 19:36

Anyone help?

I can give you my APK & OBB file to test.

  • Up0
  • Down0
Ayo Moderator
Profile picture
Join Date: 23 Jan 15
Posts: 31
Posted: Wed, 2015-03-25 17:22

Hello.

I wrote a small sample app and copy-pasted your shader.

First I added the following after the #define FETCH_DQ line:

struct DQ
{
     vec4 r;
     vec4 t;
};



However the shader did not compile:

03-25 23:55:18.424: E/Adreno-SC(9788): <CPPErrorToInfoLog:847>: GLSL line 53: Error:  EOF in Macro  FETCH_DQ
03-25 23:55:18.424: E/dynamicBranch(9788): Failed to compile the vertex shader with ERROR: Vertex shader compilation failed.
03-25 23:55:18.424: E/dynamicBranch(9788): ERROR: 0:53: '' :     GLSL compile error:   EOF in Macro  FETCH_DQ
03-25 23:55:18.424: E/dynamicBranch(9788): ERROR: 0:53: 'premature EOF' : Syntax error:  syntax error
03-25 23:55:18.424: E/dynamicBranch(9788): ERROR: 2 compilation errors.  No code generated.
03-25 23:55:18.424: E/dynamicBranch(9788): Failed to initialize the shaders!



Then I changed the following line:

void skin_vertex_weight4(inout vec3 v, inout vec3 tbn_quat, ivec4 bones, lowp vec4 weights)
{
    DQ dq;
    FETCH_DQ( bones[0] ]; // <- change the ']' character before the semicolon



to this;

void skin_vertex_weight4(inout vec3 v, inout vec3 tbn_quat, ivec4 bones, lowp vec4 weights)
{
    DQ dq;
    FETCH_DQ( bones[0] ); // <- change ']' to ')'



So I finally got the second reason why it wouldn't compile:

03-25 23:58:42.904: E/dynamicBranch(10075): Failed to compile the vertex shader with ERROR: Vertex shader compilation failed.
03-25 23:58:42.904: E/dynamicBranch(10075): ERROR: 0:32: '<' :  wrong operand types  no operation '<' exists that takes a left-hand operand of type 'float' and a right operand of type 'const int' (or there is no acceptable conversion)
03-25 23:58:42.904: E/dynamicBranch(10075): ERROR: 1 compilation errors.  No code generated.
03-25 23:58:42.904: E/dynamicBranch(10075): Failed to initialize the shaders!



The dot product result was being tested with integer '0' instead of float '0.0'.
 

if( dot( r0, dq.r) < 0 ) // change integer '0'



With this last change, the shader successfully compiled:
 

if( dot( r0, dq.r) < 0.0 ) // change to '0.0'



Could you try all the above changes and let us know if it compiles for you? Also, you should make sure to check shader compiler results in your app using "glGetShaderInfoLog" to output your compiler results before attempting to use the shader.
 

  • Up0
  • Down0
crazii
Join Date: 8 Sep 14
Posts: 13
Posted: Wed, 2015-03-25 23:04

Hi Ayo,

Thanks very very much  for your patience, that code is part of my original code and has some typos.

my shader code is same as you corrrected except that the 0 => 0.0 part. I'll try it.

Actually my code is converted from HLSL, I'll paste the orginal HLSL and converted code next time if this problem remains unsolved.

Thanks again. :)

  • Up0
  • Down0
crazii
Join Date: 8 Sep 14
Posts: 13
Posted: Mon, 2015-03-30 02:15
Now I believe there's a bug in the Adreno 3xx series. Same code works fine on Mali T628.
 
First I'd paste the original HLSL used:
 
 

#define BONE_PALETTE_SIZE 110

struct DQ
{
float4 r;
float4 t;
};

float4 bone_dq[BONE_PALETTE_SIZE*2] : BONE_PALETTE;
DQ fetchDQ(int index)
{
DQ dq; dq.r = bone_dq[ index*2 ];
dq.t = bone_dq[ index*2+1 ];
return dq;
}

//skin vertex for 4 weights
void skin_vertex_tbn_weight4(inout float3 v, inout float4 tbn_quat, int4 bones, half4 weights)
{
DQ dq = fetchDQ(bones[0]);
float4 r0 = dq.r;
DQ finalDQ;
finalDQ.r = weights[0] * dq.r;
finalDQ.t = weights[0] * dq.t;
for(int i = 1; i < 4; ++i)
{
dq = fetchDQ(bones[i]);
float shortPath = sign(sign(dot(r0, dq.r))+0.5);
finalDQ.r += shortPath*weights[i] * dq.r;
finalDQ.t += shortPath*weights[i] * dq.t;
}
finalDQ = dqnormalize(finalDQ);
v = dqmul(finalDQ, v);
tbn_quat = qqmul(finalDQ.r, tbn_quat);
 }

The animation result is not correct on Nexus 5 & Kindle HDX 7'', 8.9'' , as if the "sign(sign(dot()+0.5))" operation or "if" condition has no effect. As the same result of this code below, on all devices (even on PC):

....
for(int i = 1; i < 4; ++i)
{
dq = fetchDQ(bones[i]);
finalDQ.r += weights[i] * dq.r;
finalDQ.t += weights[i] * dq.t;
} ...

Here's the converted (& optimized) GLSL full code:#version 300 es
#if defined(ENABLE_VS)
uniform highp vec4 bone_dq[220];
uniform highp mat4 wvp_matrix;
uniform highp mat4 world_matrix;
in highp vec4 blade_position0;
in highp vec4 blade_normal0;
in mediump vec2 blade_texcoord0;
in highp vec4 blade_blendindices0;
in mediump vec4 blade_blendwight0;
out mediump vec2 blade_varying_TEXCOORD0;
out highp vec4 blade_varying_TEXCOORD1;
out highp vec3 blade_varying_TEXCOORD2;
void main ()
{
ivec4 tmpvar_1;
tmpvar_1 = ivec4(blade_blendindices0);
highp vec4 pos_2;
pos_2.w = blade_position0.w;
highp vec4 tmpvar_3;
tmpvar_3 = (((blade_normal0 * 255.0) / 128.0) - 1.0);
highp vec4 tmpvar_4; highp vec4 tmpvar_5;
highp vec4 tmpvar_6; tmpvar_6 = bone_dq[(tmpvar_1.x * 2)];
tmpvar_4 = (blade_blendwight0.x * tmpvar_6);
tmpvar_5 = (blade_blendwight0.x * bone_dq[((tmpvar_1.x * 2) + 1)]);
highp vec4 tmpvar_7;
tmpvar_7 = bone_dq[(tmpvar_1.y * 2)];
highp float tmpvar_8;
tmpvar_8 = sign((sign( dot (tmpvar_6, tmpvar_7) ) + 0.5));
tmpvar_4 = (tmpvar_4 + ((tmpvar_8 * blade_blendwight0.y) * tmpvar_7));
tmpvar_5 = (tmpvar_5 + ((tmpvar_8 * blade_blendwight0.y) * bone_dq[( (tmpvar_1.y * 2) + 1)]));
highp vec4 tmpvar_9;
tmpvar_9 = bone_dq[(tmpvar_1.z * 2)];
highp float tmpvar_10;
tmpvar_10 = sign((sign( dot (tmpvar_6, tmpvar_9) ) + 0.5));
tmpvar_4 = (tmpvar_4 + ((tmpvar_10 * blade_blendwight0.z) * tmpvar_9));
tmpvar_5 = (tmpvar_5 + ((tmpvar_10 * blade_blendwight0.z) * bone_dq[( (tmpvar_1.z * 2) + 1)]));
highp vec4 tmpvar_11;
tmpvar_11 = bone_dq[(tmpvar_1.w * 2)];
highp float tmpvar_12;
tmpvar_12 = sign((sign( dot (tmpvar_6, tmpvar_11) ) + 0.5));
tmpvar_4 = (tmpvar_4 + ((tmpvar_12 * blade_blendwight0.w) * tmpvar_11));
tmpvar_5 = (tmpvar_5 + ((tmpvar_12 * blade_blendwight0.w) * bone_dq[( (tmpvar_1.w * 2) + 1)]));
highp vec4 tmpvar_13;
highp vec4 tmpvar_14;
highp float tmpvar_15;
tmpvar_15 = sqrt(dot (tmpvar_4, tmpvar_4));
tmpvar_13 = (tmpvar_4 / tmpvar_15);
tmpvar_14 = (tmpvar_5 / tmpvar_15);
tmpvar_4 = tmpvar_13;
tmpvar_5 = tmpvar_14;
highp vec3 tmpvar_16;
tmpvar_16 = (((tmpvar_13.yzx * blade_position0.zxy) - (tmpvar_13.zxy * blade_position0.yzx)) * 2.0);
highp vec4 tmpvar_17;
tmpvar_17.xyz = (((tmpvar_13.w * tmpvar_3.xyz) + (tmpvar_3.w * tmpvar_13.xyz)) + ((tmpvar_13.yzx * tmpvar_3.zxy) - (tmpvar_13.zxy * tmpvar_3.yzx)));
tmpvar_17.w = ((tmpvar_13.w * tmpvar_3.w) - dot (tmpvar_13.xyz, tmpvar_3.xyz));
pos_2.xyz = (((blade_position0.xyz + (tmpvar_16 * tmpvar_13.w) ) + ( (tmpvar_13.yzx * tmpvar_16.zxy) - (tmpvar_13.zxy * tmpvar_16.yzx) )) + (2.0 * ( ((tmpvar_13.w * tmpvar_14.xyz) - (tmpvar_14.w * tmpvar_13.xyz)) + ((tmpvar_13.yzx * tmpvar_14.zxy) - (tmpvar_13.zxy * tmpvar_14.yzx)) )));
highp vec3 tmpvar_18;
tmpvar_18 = (((tmpvar_17.yzx * vec3(1.0, 0.0, 0.0)) - (tmpvar_17.zxy * vec3(0.0, 1.0, 0.0))) * 2.0);
highp mat3 tmpvar_19;
tmpvar_19[0u] = world_matrix[0u].xyz;
tmpvar_19[1u] = world_matrix[1u].xyz;
tmpvar_19[2u] = world_matrix[2u].xyz;
gl_Position = (pos_2 * wvp_matrix);
blade_varying_TEXCOORD0 = blade_texcoord0;
blade_varying_TEXCOORD1 = (pos_2 * world_matrix);
blade_varying_TEXCOORD2 = (((vec3(0.0, 0.0, 1.0) + (tmpvar_18 * tmpvar_17.w) ) + ( (tmpvar_17.yzx * tmpvar_18.zxy) - (tmpvar_17.zxy * tmpvar_18.yzx) )) * tmpvar_19);
}
#elif defined(ENABLE_FS)
uniform int light_count;
uniform highp vec4 light_vector[8];
uniform highp vec4 light_diffuse[8];
uniform highp vec4 light_ambient;
uniform highp vec4 light_specular[8];
uniform highp vec4 eye_position;
uniform sampler2D diffuseMap;
in mediump vec2 blade_varying_TEXCOORD0;
in highp vec4 blade_varying_TEXCOORD1;
in highp vec3 blade_varying_TEXCOORD2;
layout(location=0) out highp vec4 outBladeColor0;
void main () {
highp vec4 diffuse_1;
lowp vec4 tmpvar_2;
tmpvar_2 = texture (diffuseMap, blade_varying_TEXCOORD0);
diffuse_1 = tmpvar_2;
highp vec3 worldPos_3;
worldPos_3 = blade_varying_TEXCOORD1.xyz;
highp vec3 worldNormal_4;
worldNormal_4 = normalize(blade_varying_TEXCOORD2);
highp vec3 eye_dir_6;
highp vec4 light_7;
light_7 = light_ambient;
eye_dir_6 = normalize((eye_position.xyz - blade_varying_TEXCOORD1.xyz));
for (int i_5 = 0; i_5 < light_count; i_5++)
{
highp vec3 tmpvar_8;
tmpvar_8 = normalize((light_vector[i_5].xyz - (light_vector[i_5].w * worldPos_3)));
highp float tmpvar_9;
tmpvar_9 = dot (worldNormal_4, tmpvar_8);
light_7.xyz = (light_7.xyz + (( max (0.0, tmpvar_9) * light_diffuse[i_5].xyz) + ( pow ((max (0.0, dot (worldNormal_4, normalize((tmpvar_8 + eye_dir_6)) )) * float((tmpvar_9 >= 0.0))), 32.0) * light_specular[i_5].xyz)));
};
outBladeColor0 = (light_7 * diffuse_1);
}
#else
#error switch not defined.
#endif

Now I have to HACK in CPU code: check the dot() operation and pre-apply the sign to blend weights, and in shader code, remove dot() operation.

then the animation result turns out fine. #if DQ_GPU_SKINNING_HACK
//this C/C++ code modify blend weight before each draw call
...
if( dq0.real.dotProduct(dq.real) < 0 )
weight.weight[i] = -fWeight;
else
weight.weight[i] = fWeight;
...
#endif
To do this, I have to change "blend weight" from "normalized unsigned byte" to "half float".
This is only a HACK and need update new data of vertex attribute "blend weights"  to GLES on each draw call. That's no acceptable.

It's just helping me find what the problem is.

  • Up0
  • Down0
crazii
Join Date: 8 Sep 14
Posts: 13
Posted: Tue, 2015-06-23 20:26

Now I believe that the Adreno 3xx devices, or its driver/shader compiler has bugs with dynamic branching on UNIFORM ARRAYS.

may be the optimizer treate the uniform array as a SINGLE UNIFORM so that all shader units goes the same branch.

Can anyone verify this problem? I've linked a APK in another thread, I'm putting it here if you need test this APK & obb:

https://drive.google.com/folderview?id=0B-jwAxcRPTTafmNob2l0OXRRR1VGUTJFTkNjNFFTUXhtUXpLUWRBUlNRZWtiQmx5enZfQWM&usp=sharing

Note: put the OBB in root folder of sdcard and the APK will work.

  • Up0
  • Down0
XProger
Join Date: 17 Apr 16
Posts: 1
Posted: Sun, 2016-04-17 14:03
Hi, sorry for necro-posting, but I've solution!
I had same problem with same task (skinning with dual quaternions). My device was OnePlus One with CyanogenMod 12.1.1.
I suppose implementation of dot(vec4, vec4) is wrong on Adreno 330, because 
float d = r0.x * r1.x + r0.y * r1.y + r0.z * r1.z + r0.w * r1.w;
gave me correct result instead of dot(r0, r1)
 
P.S. weight[i] *= -1 is more efficient.
P.P.S. also you can use weight[i] *= step(0.0, d) * 2.0 - 1.0 to avoid branching but I'm not sure about efficiency.

 

  • Up0
  • Down0
crazii
Join Date: 8 Sep 14
Posts: 13
Posted: Fri, 2016-05-20 18:23

Thanks for your solution!
I'll try the workaround when possible.

  • Up0
  • Down0
or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.