Accelerate your models with our OpenCL ML SDK

Thursday 5/13/21 08:53am
|
Posted By Balaji Calidas
  • Up1
  • Down1

Qualcomm products mentioned within this post are offered by
Qualcomm Technologies, Inc. and/or its subsidiaries.

Are you using OpenCL to run machine learning workloads on the Qualcomm® Adreno™ GPU? Want to optimize your application and improve performance? Download our OpenCL ML SDK and use our OpenCL extension in your development.

Some of you have told us that you’ve written OpenCL libraries for machine learning, and you’re running them on the Adreno GPU. So we’ve added an OpenCL extension, cl_qcom_ml_ops, that lets you take advantage of the ML acceleration we’ve built into the OpenCL driver in our Android image. This is your chance to keep developing with the industry-standard OpenCL API while moving your ML workloads closer to the silicon and getting higher performance from Adreno.

cl_qcom_ml_ops extension plays nice with OpenCL
As a promoter member of The Khronos Group, Qualcomm Technologies, Inc. (QTI) works with all the major players in the GPU world. We’ve designed the extension for full compatibility with the kernels you develop using OpenCL and we’ve added the performance that comes from our deep knowledge of our Adreno GPU.

Starting with our Adreno 660 GPU in the Qualcomm® Snapdragon™ 888 mobile platform, the extension is engineered to accelerate the most common image processing and ML ops, including these:

ConvolutionFully Connected
Depthwise Separable ConvolutionSoftmax
Fused Convolution + ActivationBinary Operations
ActivationAdd, subtract, mul, min, max
ReluConcatenation
SigmoidDepth to Space
TanhPermute
Relu6Reshape
BatchNormFill
PoolingResizeBilinear
MaxPad
AverageCopyTensor
GEMMTranspose

The extension offers a new take on the concept of those ML ops, each of which is backed by a kernel optimized for the Adreno GPU. They execute in line with other OpenCL commands on the same queue, and you can use OpenCL events to track their execution. Use the extension to implement an ML model as a sequence of linked ML ops, linked by using the same tensor as the appropriate parameter for each op.

The extension is compatible with the OpenCL hooks your applications depend on for features like post-processing, controlling performance and managing memory. It takes advantage of standard OpenCL features like command queues, buffers and events, and supports FP16 and FP32 data types.

If you have other OpenCL kernels or write custom operations, mix them with operations from our extension. Then, dispatch them inline to the same queues with full compatibility.

The advantages of working lower in the stack
Working lower in the stack offers fine-grained control over memory allocation, data movement, execution and synchronization. For example, uploads of weight data and the dispatch of each ML operation are explicitly initiated by the application.

  • To profile ML operations you can use OpenCL events for details like submit times and GPU execution times.
  • Backing memory for tensors is explicitly controlled by the application; that means that application controls the tensor memory footprint. Backing memory can also be reused across tensors, reducing the footprint.
  • Since OpenCL ML is a C-based API, your models are effectively more secure because they do not need to be stored in an interpretable file format.

Plus, we’ve designed the OpenCL ML extension to enable training of ML models on Adreno soon.

What’s inside the OpenCL ML SDK?
Use the header files and documentation in the SDK to modify your applications to call new functions in the OpenCL driver we ship with our Android image.

As for tools, if you’ve used our Qualcomm® Neural Processing SDK, you’re probably familiar with utilities for easily converting, say, a TensorFlow Lite model into another format. With the OpenCL ML SDK, you’re working so close to the silicon that there’s no easy, one-click way to convert models. That’s why we’ve provided a pair of tools for extracting and converting weight tensor data from TensorFlow models:

  • Generate Graph Model Tool — This converts TensorFlow protobuf frozen models (.pb) or TensorFlowLite (.tflite) models into a TensorFlow Graph Model representation. The resulting Graph Model retains the source model's topology and weight data.
  • Graph Model to QFP16/32 Tool — This extracts the weight tensor as '.qfp16' and '.qfp32' file types, which hold half- and full-precision data respectively.

The output from those tools is data that the sample models in cl_qcom_ml_ops extension use. The SDK includes eleven sample models that demonstrate how you can use the extension to exploit other features of OpenCL on the Adreno GPU. Most of the models are versions of MobileNet for image classification that demonstrate the following:

  • Basic functionality using half-precision floating point (FP16) tensors
  • Single-precision floating point (FP32) tensors on all operations
  • The fully connected op used in place of GEMM and binary ops
  • A custom OpenCL C kernel (created by a developer) interleaved with OpenCL ML operations (from the SDK)
  • Re-using tensor backing memory for a lower memory footprint
  • Operation properties to require finding tuning cache hits during op creation
  • Recordable queues that record and dispatch the model
  • Interleaving data types (FP16 and FP32 precision) using an FP32 fully connected layer
  • Using tensors with CL buffers backed by ION memory

Also included in the kit are an Inception V3 implementation for image classification and a MobileNet SSD implementation for object detection.

Next step: Download the OpenCL ML SDK
Why write your own OpenCL kernels for ML ops like convolution, fully connected and softmax when QTI has already written them for you and optimized them for the Adreno GPU?

If you develop ML models in OpenCL and want the performance benefits of working close to the silicon, download the OpenCL ML SDK and see what it offers you.


Qualcomm Snapdragon, Qualcomm Neural Processing, and Qualcomm Adreno are products of Qualcomm Technologies, Inc. and/or its subsidiaries.