Commonly used constructs
At the heart of the Qualcomm® Snapdragon™ Heterogeneous Compute SDK are tasks — independent units of work that can be scheduled and executed asynchronously across CPU, GPU and DSP.
A task has two main branches:
- Control — Anything that can be run on the given compute core, such as C++ lambda expressions and functions, kernel abstractions, patterns, etc.)
- Data — Any piece of memory (buffers, function arguments, dynamically allocated system memory, etc.)
When developing asynchronously executed tasks, it is often a requirement to specify dependencies among tasks, like specifying that task B should execute only after task A is complete. The task structure in the Snapdragon Heterogeneous Compute SDK provides task management in the form of APIs. That allows developers to set task dependencies and even generate a heterogeneous task graph, with dependencies set between tasks across compute cores.
Consider a task A executing on CPU and a task B on GPU, with a requirement that B run only after A is complete. With the dependency set between the tasks, the Snapdragon Heterogeneous Compute runtime efficiently schedules them.
That applies to the dependency in the control branch; it also applies in the data branch. Simple APIs in the SDK allow for taking whatever data is generated by a task running on one device (CPU, GPU, DSP) and passing it to the other device.
When there is a group of related tasks, the group abstraction provided in the SDK bundles them. That allows the application to wait for group completion instead of individual task completion, then to synchronize between the application and the Snapdragon Heterogeneous Compute runtime.
Multiple APIs are available as part of the Task infrastructure.
The create_task API can be used to create any Snapdragon Heterogeneous Compute SDK task — CPU, GPU or DSP.
The wait_for API is the synchronization point between the Snapdragon Heterogeneous Compute runtime and the application (like the thread::join function)
The code in the image below shows how to create a task and launch it into the Snapdragon Heterogeneous Compute runtime.
Recall that the code in the affinity sample created a CPU kernel and set the affinity to run it on the big cluster. In the first line of code highlighted here, the create_task task API binds the CPU kernel and the data (vectors a and b). create_task binds the control, which is a CPU kernel, and the data, which is the vectors, and creates an abstraction.
In the second highlighted line, that abstraction is launched into the Snapdragon Heterogeneous Compute runtime using the launch task API. The runtime takes the task, understands that it is a CPU kernel and runs it on the CPU, adhering to the affinity settings available in the CPU kernel (if the developer has specified them).
wait_for in the third highlighted line is a synchronization point, waiting for task completion. Once the wait_for is complete, it is guaranteed that the vector_double algorithm is complete and the output vector y has every element of the input vector doubled.
Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc. and/or its subsidiaries.