Sequential variant

Without tuning

As shown below, a simple for-loop sends 10 million integers through the is_prime API. (The API already has some built-in optimizations such as skipping even numbers and running only through sqrt(n).)

Sequential Variant Image 1

Qualcomm® Snapdragon™ Profiler is ideal for revealing the runtime characteristics of this algorithm: how it affects CPU frequencies, where it gets scheduled in the system, how much CPU capacity it uses.

To gather those system metrics in Snapdragon Profiler, select Realtime mode, then track CPU Core Frequency and CPU Core Utilization.

Sequential Variant Image 2

The Snapdragon 835 mobile platform has 8 CPU cores: 4 LITTLE and 4 big. Maximum frequency is 1.9 GHz on the LITTLE cores and 2.36 GHz on the big cores.

The first two tracks of the display in Snapdragon Profiler show CPU 0 Frequency for the LITTLE cluster and CPU 4 Frequency for big cluster.

Sequential Variant Image 3

As shown above, the algorithm always runs in one core at a time because it’s a sequential for-loop. It starts in core 0 (CPU 0), switches to core 1, returns to core 0, then finishes in core 2.

The next thing to notice is that it uses the full capacity of the core (“100.000”):

Sequential Variant Image 4

The algorithm runs almost entirely in the LITTLE core and maxes out its frequency:

Sequential Variant Image 5

Then, notice that the algorithm requires 34 seconds of processing time:

Sequential Variant Image 6

And, as shown on the left, the algorithm consumes 125 mW of power on the CPU.

Those metrics form the baseline for this tutorial.

Next: Parallel variant