Patterns

Simplifying parallel programming

Commonly used constructs

Patterns are the part of the Qualcomm® Snapdragon™ Heterogeneous Compute SDK designed to simplify parallel programming and to improve application performance. Patterns abstract commonly used CPU constructs like the following:

  • Data parallelism
  • Multi-branch recursion for sorting and binary search algorithms (such as divide and conquer)
  • Pipeline computation, in which the app has a pipeline that does some work on CPU, then some image processing on GPU and computation on DSP
Patterns

Most patterns currently are CPU-centric, parallelizing the algorithm across all CPU cores (big and LITTLE).

Given the heterogeneous nature of the Snapdragon mobile platform, it makes sense to schedule more work for a big cluster than for a LITTLE cluster because of performance-efficiency.

The SDK also includes pattern tuners for optimizing parallel execution.

Typical patterns

Following are some of the patterns available in the Snapdragon Heterogeneous Compute SDK.

Patterns List

The first four patterns represent simple data parallelism constructs.

Pattern tuners

The pattern tuner APIs affect the power-performance characteristics of mobile applications. They allow customization of parallel algorithm execution for finer optimizations.

Pattern Tuners

For instance, set_chunk_size specifies large chunk for a big cluster and small chunk for a LITTLE cluster. set_max_doc is designed for specifying the number of cores over which to parallelize the application. (Snapdragon chipsets have 4 or 8 cores.)

Code sample

The code in the image below shows how to use patterns.

Pattern Tuners

The previous code sample used the vector_double API in a sequential for-loop going through every element of the vector and modifying the elements. Instead of using a sequential iteration over a collection, this code sample parallelizes the iteration over a collection using the pfor_each pattern. The first two parameters define the range of the collection on which the pfor_each should act, and the third parameter is the lambda, or actual program constructs that need to execute on every element.

By default, then, the Snapdragon Heterogeneous Compute runtime tries to maximize parallelism to all available CPU cores. On the Snapdragon 845 with eight cores (four big, four LITTLE), for example, the pfor_each pattern would potentially parallelize this iteration over all the collection across all eight cores for maximum performance.

Next: Tasks

Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc. and/or its subsidiaries.