Avoid GMEM Loads

Clear or invalidate all framebuffer attachments after every frame

Certain programming techniques that work well on platforms like PCs and gaming consoles may not port well to mobile because of different hardware conditions in mobile GPUs.

Graphics Memory (GMEM) Loads are among the most common problems affecting GPU performance in mobile apps. In this section, we show you how to use Snapdragon Profiler to find GMEM Loads in your app code.

Anatomy of a GMEM Load

The tiling architecture pipeline of the Qualcomm® Adreno™ GPU includes a render pass, during which each tile is rendered into GMEM. Following the normal behavior of the driver, the previous frame buffer data is loaded from main memory into GMEM for each tile; in other words, a GMEM Load (or unresolve) occurs.

The problem is that every GMEM Load slows processing. If, however, the content of the frame buffer is cleared or invalidated, then the driver can clear that tile in GMEM. Although that involves an additional graphics call and its associated overhead, it is less expensive than loading the frame buffer back into GMEM for every bin being rendered.

There are two main causes of GMEM Loads:

  • Improper hints to the driver — The app code makes the driver think that the previous content of the frame buffer is needed, usually by neglecting to clear the buffer. That boils down to a relatively easy fix that pays off well in reduced render time. It applies mostly to OpenGL ES programming since Vulkan handles the condition explicitly.
  • The algorithm — Certain APIs like glReadPixels and glFlush force pipeline flushes to get results. Doing this mid-frame will cause GMEM Loads when resuming to draw frame contents. You can usually avoid GMEM Loads by modifying the algorithm.

Detecting GMEM Loads in Snapdragon Profiler

Using Snapdragon Profiler in Trace Capture mode, you can allow the Rendering Stages metric to highlight GMEM Loads in their own track. The screenshot below, based on the Depth of Field demo application, shows red blocks, illustrating that GMEM Loads (Depth Stencil) are taking place as four different surfaces (0, 16, 32 and 48) are rendered:

The Settings dialog for Rendering Stages shows that those GMEM Loads take up about 9 percent of total rendering time:

If the GMEM Loads are unnecessary, you can reclaim about 9 percent of frame time.

The next step is to use Snapshot Capture mode in Snapdragon Profiler to try and determine what’s causing the GMEM Loads, as shown here:

  1. The first surface with GMEM Loads is the first surface bound — glBindFrameBuffer. The framebuffer param is set to 1.
  2. Select frame buffer object 1 (FBO 1) to examine Resources. The Inspector View shows that the surface has attachments for Color, Depth and Stencil.
  3. The glClearColor and glClearDepth calls leave the GPU thinking that the content of the Stencil attachment is relevant for the next frame, which causes the GMEM Loads.
  4. Similarly, the other three surfaces (IDs 5, 7 and 9 here) have a Stencil attachment and do not clear it.

After modifying the code to explicitly clear the Stencil content from the frame buffer, you can validate the results in Trace Capture mode, which no longer shows the GMEM Load Depth Stencil track:

In this case, rendering time is shortened by about 9 percent.

For more information on GMEM Loads (especially in extended reality apps), see the QDN blog post called Profiling VR Apps for Better Performance.