Identify Application Bottlenecks

Preliminary, high-level steps to explore performance

How many frames per second?

You may already know that you have a performance problem before you start using Snapdragon Profiler. Even if you don’t, a good first step is to examine the app’s current, overall performance and try to identify application bottlenecks.

Frame rate is an ideal place to start. Apps such as games usually run best at 30 or 60 frames per second (fps) and sometimes higher for virtual and extended reality (VR/XR) apps.

There are two aspects to that. The first is average frame rate, a measure of how fast the app runs on average. The second is the consistency of the frame rate. Even though your average stays close to your target frame rate, occasional long frames may miss that target. User experience then suffers from stuttering and glitches, and motion is not smooth, so your app could use optimization.

If your app is not optimized, the average frame rate may be lower than your target and the app will not reach its desired performance level. If, on the other hand, you have optimized your app, you may be closer to your average but still have periodic spikes in the frame rate. The spikes hamper animation, so you’ll want to smooth those out by identifying the problems and modifying your code.

In both cases, Snapdragon Profiler can take you straight to the performance level of your app in fps. The screenshot below shows an average (light blue line) of 42.022 fps.

While 42 fps may suffice as an average, the range (blue line) periodically falls as low as 37.322 fps. That suggests that the app is dropping frames, which harms performance.

Note also that applications should choose to hit frame rates that are divisors or multiples of their platform’s Vsync rate. Since a typical platform has a 60Hz Vsync, 30 Hz or 60 Hz are really the only acceptable targets.

Exploring potential chokepoints

While no single metric can show you where your performance problems lie and how to address them, Snapdragon Profiler lets you examine dozens of metrics and begin to understand how your app is interacting with the hardware. The sections in the Trace Capture mode screenshot below correspond to three important metrics you can start with:

  • Each Rendering Stage in Snapdragon Profiler is a metric that represents the app’s execution on the GPU. Each of the data tracks is a subset of the metric, and the number of tracks varies depending on the application. In the following screenshot, the green, magenta and purple bars show individual surfaces and the tracks under the Surface bars represent related rendering stages:
  • GPU Activity (below) is a system metric that shows the interaction between the CPU and GPU.
  • CPU Scheduling (“Trace Kernel - Sched CPU”) is another system metric. It provides an overview of the app’s execution on each CPU core. You can see which parts of your app are running where, and whether you have a scheduling or thread contention problem.

With those and other metrics, you can explore three of the main areas where performance may be limited: the GPU, the CPU and the Vsync, or the vertical sync refresh on the display.

GPU-bound app

In a graphics-intensive app, it’s easiest to start the process of elimination with the GPU.

In the Real-Time view of Snapdragon Profiler, GPU % Utilization is a top-level indicator. The screenshot below shows utilization in the range of 26 to 38 percent:

Compare that to the following screenshot, showing utilization in the range of 98 to 100 percent:

The latter is a strong hint that the app is GPU-bound.

Besides the Real-Time view, Trace Capture mode in Snapdragon Profiler offers you another point of reference. If the app is not GPU-bound, then gaps in GPU activity would likely show up:

If, however, the app is GPU-bound, then GPU execution might be delayed and the GPU might constantly be rendering surfaces:

We’ll continue down the path of the GPU-bound app below, but first we examine the other two potential chokepoints.

CPU-bound app

If the GPU hunch doesn’t pan out, see whether the app is CPU-bound.

Unlike in a GPU-bound app, the CPU % Utilization in the Real-Time view is not such a reliable indicator of a CPU-bound app, as shown in these screenshots:

Average CPU utilization is 16 percent in the first app and 23 percent in the second. The first is CPU-bound, but the difference between them is not as striking as in the GPU-bound app above, so the app may appear not to be CPU-bound.

Two strong criteria for a CPU-bound app are that it is not GPU-bound and that it has an average frame time exceeding 16 ms. To determine whether an app is CPU-bound, consider the multi-threaded nature of the app and the CPU. Also, it’s helpful to go beyond CPU % utilization and examine metrics like frequencies and thread scheduling.

In the Trace Capture mode screenshot below, the thread in Sched CPU 6 appears to bottleneck the CPU:

The next step is to look deeper into that thread to identify hotspots and candidates for multi-threading. The Sampling capture mode periodically samples the CPU program counter at a fixed interval and identifies CPU hot-paths. It offers a statistical representation of activity, including the time spent in each function and library. You can see which functions in your code take the most execution time.

The following capture shows both the functions and the sequence of functions running on the CPU to render cloth textures. The red and orange blocks indicate hotspots in the app code:

In this case, CPU sampling shows SatisfyConstraints() as the biggest hotspot, with 98 percent of activity.

Going even deeper, Snapdragon Profiler supports user markers and the Native Tracing API built into the Android NDK. In other words, Android developers can use that API to insert trace markers into the app code and see this data in Snapdragon Profiler.

In this example, you can use android/trace.h to instrument the SatisfyConstraints() call and trace it back to the main thread, which is a worker function:

A texture in the rendered frame requires one function per cloth, which ticks as needed for the elapsed time. Once you’ve modified the app code to thread the functions out, Snapdragon Profiler can display the following inclusive, correlated view of all activity at once:

Now, cloth computation is taking place in multiple threads. The output in CPU Scheduling shows more threading. The app is no longer CPU-bound, so the CPU waits for draws to be dispatched.

That process is typical for identifying, diagnosing and solving a performance problem with Snapdragon Profiler.

Vsync-bound app

If you determine that the app is neither CPU- nor GPU-bound, then it may be Vsync-bound; that is, it may be running as fast as the display hardware can accommodate. Using the Trace Capture mode in Snapdragon Profiler, you might see something similar to this:

Note that the frame time is right around 16ms (1 second divided by 60 fps) and there is a gap near the end of the frame during which both GPU and CPU are waiting for an available surface to render. That means that the app is running as fast as the display’s refresh rate will allow.

The goal of many apps is to run as fast as the display allows. But even when the app is Vsync-bound, there may still be opportunities for optimization, with benefits beyond a higher frame rate.

Most problems in mobile apps boil down to power consumption and, therefore, battery life. Even if your app is able to meet the target frame rate without binding the CPU and GPU, you may still want to optimize your code. If modified to use less power and run at a lower temperature, the app would meet the target frame rate while doing less work. It might also offer better performance on lower-end devices.