Commonly used constructs
Patterns are the part of the Snapdragon® Heterogeneous Compute SDK designed to simplify parallel programming and to improve application performance. Patterns abstract commonly used CPU constructs like the following:
- Data parallelism
- Multi-branch recursion for sorting and binary search algorithms (such as divide and conquer)
- Pipeline computation, in which the app has a pipeline that does some work on CPU, then some image processing on GPU and computation on DSP
Most patterns currently are CPU-centric, parallelizing the algorithm across all CPU cores (big and LITTLE).
Given the heterogeneous nature of the Snapdragon mobile platform, it makes sense to schedule more work for a big cluster than for a LITTLE cluster because of performance-efficiency.
The SDK also includes pattern tuners for optimizing parallel execution.
Following are some of the patterns available in the Snapdragon Heterogeneous Compute SDK.
The first four patterns represent simple data parallelism constructs.
The pattern tuner APIs affect the power-performance characteristics of mobile applications. They allow customization of parallel algorithm execution for finer optimizations.
For instance, set_chunk_size specifies large chunk for a big cluster and small chunk for a LITTLE cluster. set_max_doc is designed for specifying the number of cores over which to parallelize the application. (Snapdragon chipsets have 4 or 8 cores.)
The code in the image below shows how to use patterns.
The previous code sample used the vector_double API in a sequential for-loop going through every element of the vector and modifying the elements. Instead of using a sequential iteration over a collection, this code sample parallelizes the iteration over a collection using the pfor_each pattern. The first two parameters define the range of the collection on which the pfor_each should act, and the third parameter is the lambda, or actual program constructs that need to execute on every element.
By default, then, the Snapdragon Heterogeneous Compute runtime tries to maximize parallelism to all available CPU cores. On the Snapdragon 845 with eight cores (four big, four LITTLE), for example, the pfor_each pattern would potentially parallelize this iteration over all the collection across all eight cores for maximum performance.
Snapdragon is a product of Qualcomm Technologies, Inc. and/or its subsidiaries.