Overview

Visibility processing

Early Z rejection

Early Z rejection provides a fast occlusion method with the rejection of unwanted render passes for objects that are not visible (hidden) from the view position. Adreno GPUs can reject occluded pixels at up to 4x the drawn pixel fill rate.

The image below shows a color buffer represented as a grid, and each block represented as a pixel. The rendered pixel area on this grid is colored black. The Z-buffer value for these rendered black pixels is 1.

If you are trying to render a new primitive onto the same pixels of the existing color buffer that has the Z-buffer value of 2 (second grid with green blocks), the conflicting pixels in this new primitive will be rejected (third grid representing the final color buffer).

Cubemap

To get the maximum benefit of this feature, we recommend drawing a scene with primitives sorted out from front-to-back; i.e., near-to-far. This ensures that the Z-reject rate is higher for the far primitives, which is useful for applications with high-depth complexity.

FlexRender™ technology (Hybrid Deferred and Direct Rendering mode)

FlexRender is a feature of Adreno GPUs. Adreno 3X (A3X) refers to their ability to switch between indirect rendering (i.e., binning or tiled rendering) and direct rendering to the frame buffer.

There are advantages to both direct and deferred rendering modes. The Adreno GPUs were designed to maximize performance by switching between the two modes in a dynamic fashion. The driver and GPU analyze the rendering parameters for a given render target and selects the mode automatically.

Tile-based rendering

To optimize rendering for low-power and memory-bandwidth-limited devices, Adreno GPUs use a tiled-based rendering architecture. This rendering mechanism breaks the scene frame buffer into small rectangular regions for rendering. Region sizes are automatically determined so that they are optimally rendered using local, low-latency memory on the GPU (referred to as GMEM), rather than using a bandwidth-limited bus to system memory.

The deferred mode rendering mechanism of the Adreno GPU also uses a tile-based rendering architecture. It implements a binning approach to create bins of primitives that are processed in each tile.

The Adreno GPU divides a frame into bins and renders them one at a time. During rendering.it uses on-chip high performance Graphics Memory (GMEM) to avoid the cost of going to system memory.

In the image below, you see the two passes that are performed over the graphic primitives (Binning and Rendering). In this example there are three triangles that will be rendered in the frame buffer. The Binning Pass marks which bins a triangle is visible in (visiblty stream). This stream is stored to system memory.

Binning Overview

In the Rendering Pass, only the visible primitives for each tile to be rendered are processed by reading the visibility stream. By using GMEM as a local color and Z-buffer, the primitives are rendered. Once the rendering is complete for the tile, the GMEM color contents are sent back (resolved) to system memory. This process is repeated for all bins.

Vulkan’s Renderpass feature is highly advantageous for tiling architectures like Adreno, because multiple rendering passes can be done in GMEM. This ultimately minimizes costly resolve operations.

Binning

Low Resolution Z pass

A Low Resolution Z (LRZ) pass was added with Adreno 5X (A5X). This pass is also referred to as draw order independent depth rejection. During the binning pass, a low resolution Z-buffer is constructed, and can reject LRZ-tile wide contributions to boost binning performance. This LRZ is then used during the rendering pass to reject pixels efficiently before testing against the full resolution Z-buffer.

Warning

Certain conditions send hints to the driver to disable LRZ which include:

  • Writing depth in fragment shader

  • Use of secondary command buffers (Vulkan)

  • Any condition where direct rendering is required

This feature has the advantages of reducing memory access, reducing rendered primitives, not requiring an application to draw front to back, and allowing for an improved frame rate.

Shader support

Unified shader architecture

All Adreno GPUs support the Unified Shader Model, which allows using a consistent instruction set across all shader types (vertex and fragment shaders). In hardware terms, Adreno GPUs have computational units (e.g., Algorithmic Logic Unit (ALUs)) that support both fragment and vertex shaders.

Adreno GPUs use a shared resource architecture that allows the same ALU and fetch resources to be shared by the vertex shaders, pixel or fragment shaders, and general purpose processing.

Shader processing is done within the unified shader architecture, as shown in the following image. This image shows that vertices and pixels are processed in groups of four as a vector, or a thread. When a thread stalls, the shader ALUs can be reassigned.

Unified Shader Architecture

In unified shader architecture, there is no separate hardware for the vertex and fragment shaders, as shown in the image below. This allows for greater flexibility of pixel and vertex load balances.

Unified Shader Architecture (2)

The Adreno shader architecture is also multithreaded. If a fragment shader execution stalls due to a texture fetch, the execution is given to another shader. Multiple shaders are accumulated as long as there is room in the hardware.

No special steps are required to use the unified shader architecture. The Adreno GPU intelligently makes the most efficient use of the shader resources depending on scene composition.

Scalar architecture

Adreno GPUs have a scalar component architecture. The smallest component they can support natively is a scalar component. This results in more efficient hardware resource use for processing scalar components, and it does not waste a full vector component to process the scalar.

Scalar architecture can be twice as power-efficient and deliver twice the performance for processing a fragment shader that uses medium-precision 16-bit floating point (mediump) processing, compared with high-precision 32-bit (highp) floating point.

Universal bandwidth compression

Universal bandwidth compression (UBWC) is supported by all A5x GPUs. UBWC is a unique predictive bandwith compression scheme that improves effective throughput to system memory. By minimizing the bandwidth of data, significant power savings can be achieved.

UBWC works across many components in Snapdragon processors including GPU, Display, Video, and Camera. The compression supports YUV and RGB formats, and reduces memory bottlenecks.

Texture features

Multiple textures

Multiple texturing or multitexturing is the use of more than one texture at a time on a polygon. Adreno GPUs support up to 32 total textures in a single render pass, i.e., up to 16 textures in the fragment shader and up to 16 textures at a time for the vertex shader.

Effective use of multiple textures significantly reduces overdraw, saves ALU cost for fragment shaders, and avoids unnecessary vertex transforms.

To use multiple textures in applications, refer to the multitexture sample in the Adreno SDK for OpenGL ES.

Video textures

Adreno GPUs support video textures, which consist of moving images that are streamed in real-time from a video file. Video textures are a standard API feature in Android (Honeycomb or later versions). Refer to Android documentation for additional details on surface textures at http://developer.android.com/reference/android/graphics/SurfaceTexture.html.

Apart from using the standard Android API as suggested, the standard OpenGL ES extension can also be used, e.g., if an application requires video textures. For more information, refer to http://www.khronos.org/registry/gles/ extensions/OES/OES_EGL_image.txt.

Video Texture

Texture compression

Texture compression can significantly improve the performance and load time of graphics applications because it reduces texture memory and bus bandwidth use. Compressed textures can be created using the Adreno Texture Compression and Visualization Tool and is subsequently used by an OpenGL ES application.

Important compression texture formats supported by Adreno GPUs include:

  • ATC – Proprietary Adreno texture compression format (for RGB and RGBA).

  • ETC – Standard OpenGL ES 2.0 texture compression format (for RGB only).

  • ETC2 – Standard OpenGL ES 3.0 and Vulkan texture compression format supporting R, RG, RGB, and RGBA component layouts, as well as sRGB texture data.

  • ASTC - Texture compression format supported in OpenGL ES (3.0 and later) and Vulkan that allows compression to use a variable block size.

Adreno GPUs support both HDR and LDR profiles for ASTC.

To learn more about the use of texture compression, see the Compressed Texture tutorial in the Adreno SDK for OpenGL ES.

Floating point textures

Adreno GPUs support floating point texturing features including the following:

  • Texturing and linear filtering of FP16 textures via the GL_OES_texture_half_float and GL_OES_texture_half_float_linear extension.

  • Texturing from FP32 textures via GL_OES_texture_float.

For a complete listing of supported texture and surface formats, refer to the Texture formats Feature Table.

Cube mapping with seamless edges

Cube mapping is a fast and inexpensive way to create advanced graphic effects such as environment mapping. Cube mapping takes a three-dimensional texture coordinate and returns a texel from a given cube map (similar to a sky box).

Adreno GPUs support seamless edge support for cube map texture sampling.

Cubemap

Large texture size

Adreno 4X (A4X) and A5X GPUs support texture sizes up to 16384x16384x16384 (depending on memory availability). A3X supports texture sizes up to 8192x8192x8192.

sRGB textures and render targets

sRGB is a standard RGB color space created cooperatively by Hewlett-Packard and Microsoft in 1996 for use on monitors, printers, and the Internet. Smartphone and tablet displays today also assume sRGB (nonlinear) color space. sRGB provides the best viewing experience with correct colors, and ensures that the color space for render targets and textures match the color space for the display.

Unfortunately, OpenGL ES assumes linear or RGB color space by default. As Adreno GPUs support sRGB color space for render targets and textures, it is possible to ensure a correct color viewing experience. Note that Vulkan fully handles sRGB in both textures and swapchain presentable images.

Other supported features

Percentage Closer Filtering for depth textures

Adreno GPUs have hardware support for the OpenGL ES 3.0 and Vulkan feature of Percentage Closer Filtering (PCF). A hardware bilinear sample is fetched into the shadow map texture, which alleviates the aliasing problems that can be seen with shadow mapping in real time applications.

Index types

A geometry mesh can be represented by two separate arrays. One array holds the vertices, and the other holds sets of three indices into that array. Together, they define a triangle. Adreno GPUs natively support 8-bit, 16-bit, and 32-bit index types. Most mobile applications use 16-bit indices.

Multisample anti-aliasing

Anti-aliasing is an important technique for improving the quality of generated images. It reduces the visual artifacts of rendering into discrete pixels. Among the various techniques for reducing aliasing effects, multisample anti-aliasing (MSAA) is efficiently supported by Adreno GPUs.

As shown in the following image, multisampling divides every pixel into a set of samples, each of which is treated like a “mini-pixel” during rasterization. Each sample has its own color, depth, and stencil value. Those values are preserved until the image is ready for display. When it is time to compose the final image, the samples are resolved into the final pixel color.

MSAA

Vertex texture access or vertex texture fetch

Adreno GPUs have the advantage of shared resources to process vertex and fragment shaders with its direct access to the texture cache. This allows users to easily implement vertex texture algorithms for function definitions, displacement maps, or lighting level-of-detail (LoD) systems on Adreno GPUs.

Vertex texture displacement is an advanced technique that can render realistic water in games on a desktop and for consoles. The same can now be implemented in applications running on Adreno GPUs.

The following is an example of how to create a texture fetch in the vertex shader:

//vertex shader

attribute vec4 position;

attribute vec2 texCoord;

uniform sampler2D tex;

void main() {

    float offset = texture2D(tex, texCoord).x;

    …..

    gl_Position = vec4(….);

}

Adreno APIs

Adreno GPUs support industry-standard APIs including:

  • OpenGL ES 1.x (fixed function pipeline)

  • OpenGL ES 2.0 (programmable shader pipeline)

  • OpenGL ES 3.0

  • OpenGL ES 3.1 + AEP

  • OpenGL ES 3.2

  • EGL

  • Vulkan 1.0

  • Vulkan 1.1

  • OpenCL 1.1e

  • OpenCL 2.0 Full Profile

  • DirectX 11 FL 9.3

  • DirectX 12 FL 12