Vertex Processing

General

This section describes tips and tricks that can help optimize the way Vulkan applications organize vertex data so that the rendering process can run efficiently on Adreno architectures.

Use interleaved, compressed vertices

For vertex fetch performance, interleaved vertex attributes (“xyz uv | xyz uv | …”, rather than “xyz | xyz | .. | uv | uv | ..”), work the most efficiently. The throughput is better with interleaved vertices and compressing vertices improves it further. Deferred rendering gives these optimizations an advantage.

For binning pass optimization, consider:

One array with vertex attributes and other attributes needed to compute position, and
Another interleaved array with other attributes.

There is currently no support for half floats in Vulkan shaders.

Note

Under OpenGL ES 2.0, the Adreno OpenGL ES implementation supports the GL_OES_vertex_half_float extension, which allows programmers to fill vertex buffers with half float coordinates. This functionality became a core feature of OpenGL ES 3.0. Half float coordinates can be used for texture or normal coordinates where lower precision will not hurt the final imagery, and improve the rendering performance.

Consider geometry instancing

Geometry instancing renders multiple copies of the same mesh or geometry in a scene at once. This technique is used primarily for objects like trees, grass, or buildings that can be represented as repeated geometry without appearing unduly repetitive. However, geometry instancing can also be used for characters.

Although vertex data is duplicated across all instanced meshes, each instance could have other differentiating parameters changed to reduce the appearance of repetition, i.e., color, transforms, or lighting.

As shown in the image below, all barrels in the scene could use a single set of vertex data that is instanced multiple times instead of using unique geometry for each one.

Geometry instancing offers a significant savings in memory usage. It allows the GPU to draw the same geometry multiple times in a single draw call with different positions, while storing only a single copy of the vertex data, a single copy of the index data, and an additional stream containing one copy of a transform for each instance. Without instancing, the GPU would have to store multiple copies of the vertex and index data.

vkCmdDraw, vkCmdDrawIndexed, vkDrawIndirectCommand, and vkDrawIndexedIndirectCommand support instanced rendering.

OpenGL ES Specific

Use Z-only rendering

The GPU has a special mode to write Z-only pixels at twice the normal rate, e.g., when an application renders to a shadow map. The hardware must be told by the driver to enter this special rendering mode and, without a specific OpenGL state, the driver needs hints from the application.Using an empty fragment shader and disabling the frame buffer write masks are good hints.

Some developers take advantage of double-speed, Z-only rendering by laying down a Z-prepass before rendering the main scene. Performance tests must still be run to determine if this is beneficial on Adreno.

Select the best vertex format

The Adreno GPU provides hardware support for the following vertex formats:

GL_BYTE and GL_UNSIGNED_BYTE
GL_SHORT and GL_UNSIGNED_SHORT
GL_FIXED
GL_FLOAT
GL_HALF_FLOAT
GL_INT_2_10_10_10_REV and GL_UNSIGNED_INT_2_10_10_10_REV

When preparing vertex data sets for optimal performance, always use the vertex format that provides a satisfactory level of precision and takes the least amount of space.

Use indirect indexed draw calls

Introduced in OpenGL ES 3.1, indirect draw calls move the draw call overhead from the CPU to the GPU. This can provide a significant performance benefit over the regular draw calls under the Adreno architecture.

If an application is targeting the latest OpenGL ES version, consider using this new feature for improved rendering efficiency. For example, if the renderer is based on the concept of a scene graph, it is possible to cache the draw call arguments necessary to render the mesh nodes in a buffer object store during loading time. The store can then be used during rendering time as an input to the glDrawArraysIndirect or glDrawElementsIndirect functions.