Qualcomm® Adreno™ GPU
Adreno GPUs are integrated in the all-in-one design of Qualcomm® Snapdragon™ processors to provide sophisticated rendering capabilities to enable the latest games, user interfaces, and web technologies present in mobile devices today. They have been designed purposely for mobile APIs and device constraints, with an emphasis on performance and efficient power use.
This guide outlines the various technologies and subsystems provided by the Adreno GPU to support the graphics developer. Best practices are discussed in the Best Practices section.
This document strives to cover a breadth of topics relevant to various Adreno GPUs. As the architecture evolves, new functionalities are added to the GPU. Some sections of this guide will only be relevant to the Adreno GPUs that support the given features.
When relevant, the Adreno GPU series in which a feature is present is described. If not specified, it can be assumed such functionality is present on most or all Adreno GPUs.
- Overview
- Best Practices
- General
- Shaders
- General
- Use built-ins
- Use the appropriate data type
- Reduce type casting
- Pack scalar constants
- Keep shader length reasonable
- Sample textures in an efficient way
- Threads in flight/dynamic branching
- Pack shader interpolators
- Minimize usage of shader GPRs
- Minimize shader instruction count
- Avoid uber-shaders
- Avoid math on shader constants
- Avoid discarding pixels in the fragment shader
- Avoid modifying depth in fragment shaders
- Avoid texture fetches in vertex shaders
- Break up draw calls
- Use medium precision where possible
- Favor vertex shader calculations over fragment shader calculations
- Measure, test, and verify results
- Prefer uniform buffers over shader storage buffers
- Eliminate subpixel triangles during tessellation
- Do back-face culling during tessellation
- Disable tessellation whenever it is not needed
- Keep UBOs as small as possible
- OpenGL ES Specific
- General
- Texture
- Tiling Architecture
- Vertex Processing
- Frequently Asked Questions
- General
- What is the optimal way to sort objects? Is front-to-back object submission needed or, given the tiling architecture, is that not necessary?
- Are there “no-copy’ paths available for other Snapdragon hardware blocks?
- What is a good TEX to ALU ratio?
- What is the Triangle setup rate?
- Which is better: vertex stream or attribute fetching in VS?
- Is dedicated video memory possible?
- How many Occlusion Queries should I use?
- What is the Occlusion Query performance?
- How are timer queries calculated in Adreno GPUs?
- What is the performance of user clip planes?
- Which has better performance, alpha test vs alpha blend?
- Best single pass stereo? Texture array? Need GS?
- What is the behavior on dynamic branching in fragment shaders?
- LRZ
- Textures and formats
- Tiling architecture
- Do you have any details of how the tiling and binning process works?
- What conditions trigger direct rendering with FlexRender?
- Is the full Vertex Shader used when performing binning?
- Is binning affected by fragment Z occlusion?
- What is the CPU cost of binning?
- If a primitive spans multiple tiles, will the GPU insert synthetic vertices at tile boundaries?
- Vulkan
- Is there a performance impact of using Vulkan Secondary Command Buffers?
- Is there a performance benefit or cost to using push constants?
- Static vs. dynamic state?
- What is the recommended usage SSBO vs. UBO vs. Texture fetch?
- What is the recommended sampler type?
- How to ensure Vulkan Subpasses merged properly?
- General
- Spec Sheets