Tiling Architecture

Avoid GMEM loads

Clear or invalidate all framebuffer attachments to hint to the GPU not to load tile data from system memory into GMEM.

Ensure proper hints are given to the driver to invalidate or clear the contents of a framebuffer before rendering.

Ensure proper usage of loadOp for render passes which do not require the contents to be loaded into GMEM by using LOAD_OP_CLEAR or LOAD_OP_DONT_CARE.

For a more detailed explanation on GMEM Loads and how to identify and resolve them, refer to the Understanding and resolving Graphics Memory Loads guide.

Remove unused render targets

When rendering is complete for a tile within GMEM, it gets resolved (GMEM Store) into System Memory. This effectively ‘stitches’ the frame buffer back together.

This operation is optimized but not free. The more tiles generated, the more GMEM Store operations are needed. It is recommended to examine the rendered surfaces and verify they are all needed. Removing unnecessary rendered surfaces decreases the total size of the frame buffer object, decreases the number of tiles needed, and improves overall performance.

GMEM Store Reduction

Down-scale render targets

Similar to the Remove unused render targets best practice, down-scaling render targets, when possible, is recommended for the same reasons. This can be in the form of width/height dimensions or decreasing render target precision.

Note

Snapdragon Profiler ‘Rendering Stages’ metric in Trace capture can display per-surface information which includes the number of tiles generated by the GPU for the given surface.

Vulkan subpasses

Vulkan introduced render ‘subpasses’, which allow developers to set up render pipelines that explicitly state their usage, render target interactions, dependencies, transitions, etc. This allows GPUs to make informed decisions on how to handle these frame buffer transitions efficiently. To efficiently use GMEM, proper subpass use is crucial in Tile-based rendering architectures such as the Adreno GPU.

A properly structured renderpass allows Vulkan to instruct the GPU to execute all subpasses on a per-tile basis. That is, the full subpass chain can be executed for each tile, thus avoiding the need to resolve subpasses to system memory after each pass. Proper setup of these subpasses is required for the Vulkan driver to “merge” the subpasses into one. This can result in gains of over 10% frametime depending on subpass chain complexity and configuration. A successful Vulkan subpass considers the following:

  • Subpass count > 1

  • Renderpass has input targets

  • Resolve attachment cannot be reused in the following subpass

  • srcAccessMask cannot be VK_ACCESS_SHADER_WRITE_BIT, and dstAccessMask cannot be VK_ACCESS_SHADER_READ_BIT

  • Starting from the second subpass where the input_attachments field is used, the dstAccessMask must be set to VK_ACCESS_INPUT_ATTACHMENT_READ_BIT

Note

Subpass merging only applies when the given surface is being rendered in binning mode. Snapdragon Profiler ‘Rendering Stages’ metric in Trace capture is a great way to identify the mode these surfaces are being rendered with, and if proper merging has been accomplished. Additionally, using the Vulkan Adreno Layer can also help identify if subpasses could not be merged properly by logging the VKDBGUTILWARN003 flag.