Hello,
In my game I use large VBOs for vertex data (ex: position, normal, uv0, uv1, color, interleaved)
And I also use VBOs for primitive indices
I noticed the following behavior on a Galaxy S4 (i9505, Adreno 320) :
Most of my draw calls are fast (0.1 to 0.2ms per draw call)
However I get some very slow draw calls (2 to 8ms)
After some debugging it seems that the following "rule" will cause a slow draw call :
1) The vertex VBO size is > 64kb
2) The draw call is referencing only a subset of all the vertices in the VBO, but it references some vertices near the begining of the VBO AND near the end of the VBO
Here is an example
I have a vertex VBO holding let's say 30k vertices (size > 64kb)
I do a draw call for only 2000 primitives from this VBO (so I draw only a subset of what is contained in the VBO)
=> the draw call is slow as soon at it references at the same time vertices near the start of the VBO AND near the end
If I "split" my large vertex VBOs in smaller ones with size <= 64kb then the draw calls are "fast" but of course this increases a lot the draw call count.
Do you have a technical explanation for this ? I am really puzzled ... the only explanation I would have is that the driver would actually split the draw call in multiple sub draw calls when the VBO is too large...
Note that the same behavior (performance issues with VBOs > 64kb in size) seems to occur on Mali GPU but never occurs on PowerVR or Tegra GPUs.
Thanks
After some more investigations I found the cause of the problem, this is glVertexAttrib.
My shaders are using some attributes like Color or textureCoord3, but for some specific meshes those attributes are not present in the data,
so in this case, instead of using glVertexAttribPointer, I use glVertexAttrib, passing a constant value (ex: white color if the color attribute data is missing from the mesh)
And so, at least on Adreno 320 on Galaxy S4 (i9505) doing this when there are a lot of vertices involved in the draw call causes a performance bottleneck.
So, this is now solved (by making sure shaders do not use attributes which are not there in the mesh data, or by changing shaders to simply not use the data), but this does not explain why this seems to happen with Adreno GPUs while this is working perfectly on PowerVR or Tegra ... so I would still be interested in any technical insight you would have about this
Thanks !
There are specific optimzations done in the way VBO cacheing is implemented. We recommend that all attributes and only used attributes are stored in VBOs, attributes be interleaved, and also that VBO updating occurs after the swap and before the depth buffer clear.