Kernels

Units of work to be executed

Variants of kernels

In the context of heterogeneous computing, a kernel is the actual unit of work or the computation to be executed in the different cores. Kernels bridge the gap between the way the algorithm or function runs in the application and the way to make it run using the Qualcomm® Snapdragon™ Heterogeneous Compute SDK runtime and other features. It is conceivable to encapsulate any existing algorithm or function into a kernel abstraction and use it with other features in the SDK.

The Snapdragon Heterogeneous Compute SDK supports three variants of kernel abstraction:

  • CPU kernel — Encapsulates any function or program construct that runs on the CPU, like C++ functions, lambda expressions, or function pointers
  • GPU kernel — Encapsulates any existing OpenCL or OpenGL kernels
  • DSP kernel — Encapsulates any kernel written using the Qualcomm® Hexagon™ SDK

The kernel abstractions also provide a base for developers to set attributes or other hints that signal to the heterogeneous compute runtime whether a particular function is blocking or not. In an application running an I/O or network operation, for example, the Snapdragon Heterogeneous Compute runtime can optimize execution and schedule work based on the blocking nature of the APIs. Those attributes can be set to the CPU kernel.

The Snapdragon Heterogeneous Compute SDK supports two execution models for CPU/GPU/DSP variants:

  • Poly kernel — Developers write all three variants of the kernel, then allow the Snapdragon Heterogeneous Compute runtime to determine which variant to execute, based on system characteristics like load or power constraints.
  • Point kernel — Developers write all three variants of the kernel to run everywhere. The recommendation is to use a point kernel if the app is highly data-parallel and it is necessary to extract the maximum amount of parallelism for best performance.

Code sample

The code in the image below shows how to create a kernel from an existing algorithm in an application.

The vector_double API in the highlighted lines below doubles the value of an input vector, then stores the result as an output vector.

Code Sample

The highlighted lines below create a kernel abstraction by calling the create_cpu_kernel API and passing in the vector_double function. The result would be a CPU kernel abstraction, ready to use with other features in the SDK.

Code Sample

Similarly, create_gpu_kernel and create_dsp_kernel are available. (Documentation on all APIs is available in the Heterogeneous Compute SDK - Documentation and Interface Specification.)

Next: Affinity

Qualcomm Snapdragon and Qualcomm Hexagon are products of Qualcomm Technologies, Inc. and/or its subsidiaries.