Accelerate your machine learning networks using TVM and the Adreno OpenCL ML APIs on Adreno GPUs

Tuesday 9/6/22 08:26am
|
Posted By Siva Rama Krishna Reddy B
  • Up0
  • Down0

Qualcomm products mentioned within this post are offered by
Qualcomm Technologies, Inc. and/or its subsidiaries.

You like running your machine learning (ML) workloads with the Qualcomm Adreno OpenCL ML SDK on Adreno GPUs. But you also want to use the optimized kernel library for Adreno, with the kind of end-to-end solution that comes from Tensor Virtual Machine (TVM) compiler infrastructure. Can you get the best of all worlds?

Now you can. We’ve taken the Adreno OpenCL Machine Learning SDK and integrated it to the open-source TVM compiler. The integration is designed so that developers using TVM as a compiler framework can get better performance when they run ML workloads on Adreno GPUs.

So, if you’re running deep learning applications on Adreno, this is an ideal possible combination: the hardware-level performance of OpenCL ML, and the high-level optimizations and flexible framework of TVM. With the accelerated OpenCL ML library now integrated to the TVM compiler framework, you’ll get optimized performance when Adreno is your target platform.

Using the GPU for machine learning
Up to now, there have been several separate ways to accelerate machine learning using Adreno GPUs with Snapdragon technology.

First, there’s the Qualcomm Neural Processing SDK. As a mature solution for ML workloads on edge computing, the proprietary, closed-source SDK has achieved tremendous success. It provides customers a large suite of tools and SDKs to accelerate neural networks by using all available computing devices on Snapdragon, including CPU, GPU and DSP. Manufacturers and developers have adopted the Qualcomm Neural Processing SDK because of its high, commercial quality.

Then, some advanced developers prefer to run their ML workloads specifically on Adreno GPUs. They can take advantage of the recently released Qualcomm Adreno OpenCL ML SDK for customization, flexibility and acceleration on Adreno. The Adreno OpenCL ML SDK is based on OpenCL 2.0, an open, widely adopted standard for parallel programming of heterogeneous systems. It includes several hand-optimized OpenCL kernels written by Adreno GPU experts using Adreno specific hardware features for ML operators. The Adreno OpenCL ML SDK offers the full computing power of Adreno GPUs and exposes OpenCL optimizations like 32-bit, floating-point precision and texture memory support.

Finally, you can use TVM, which is an open-source compiler framework for deep learning workloads. TVM can automatically generate several OpenCL kernel implementations for a given ML operation or layer. It can then use an ML-based tuning methodology to find the best-performing OpenCL kernels from a large search space. TVM can do op-level and graph-level optimizations on ML models to generate high-performance OpenCL kernel implementations for a wide range of hardware modules. And, because it’s open source, TVM is backed by a large, active community with members from both industry and academia.

Get the best of all worlds: OpenCL ML performance and TVM flexibility
What we’re announcing now is that you can get support for the Adreno OpenCL ML SDK in TVM by using Bring Your Own Codegen. The TVM community has introduced BYOC as a way of embedding the high-performance kernels from vendor acceleration libraries (like Adreno) into the main code generated by TVM. So, we’re taking advantage of BYOC to integrate the Adreno OpenCL ML SDK into TVM for an end-to-end solution.

The thing is, although the Adreno OpenCL ML SDK is powerful, its proprietary APIs come with a learning curve. This integration is simpler than using the Adreno OpenCL ML SDK on its own because you don’t need to understand your OpenCL ML specification or your header files or which APIs to call. The integration allows you to get started using OpenCL ML on day one, without having to learn all about the API definition.

If you know how to use TVM, you can enable the BYOC feature for the Adreno OpenCL ML SDK for some layers. TVM will then be able to generate a network with the high-performance kernels from the Adreno OpenCL ML SDK. Your users will get the high performance of the hand-optimized kernels using Adreno features that TVM does not know or expose.

This integration is designed to easily import deep learning models from the frameworks that TVM supports, such as TensorFlow, PyTorch, Keras, CoreML, MXnet and ONNX. It makes use of the graph-level optimizations of TVM and of the Adreno OpenCL ML library kernels as much as possible. For any kernels or operators not supported by the Adreno OpenCL ML SDK, BYOC allows a fallback option to any back end supported by TVM.

How to compile a model in TVM with OpenCL ML
As shown below, you can now:

  • Take an ML model that you have trained
  • Import it using the TVM compiler framework
  • Instruct the TVM compiler to use the OpenCL ML SDK accelerated path for Adreno

Start by importing the CLML Python front end:

from tvm.relay.op.contrib import clml

Next, instruct the compiler to use the CLML acceleration path by calling the following API on the relay module instance:

mod = clml.partition_for_clml(mod, params)

Optionally, you can check the availability of CLML support in the TVM compiler with the following query API:

clml.is_clml_runtime_enabled()

The diagram below illustrates flow and relationships in the integration:

TVM BYOC Framework

The end-to-end solution gives you performance on par with commercial solutions like our Qualcomm Neural Processing SDK.

Results of integrating Adreno OpenCL ML SDK with TVM
A really significant result is the performance boost we have seen in our internal testing on MobileNetV1: about 2x for FP32 and about 4x for FP16, as shown below:

The next result is the list of layers we support in our first release of the integration:

  • Convolution
  • Batchnorm
  • Dense
  • Pad
  • Clip
  • ReLU family
  • Global Average/Max Pool2D
  • Softmax
  • Reshape

Those layers offload most operators for well-known neural networks.

Then, we have placed sample tests in the TVM repo. You can test for yourself the API usage of OpenCL ML in the TVM compiler.

Finally, our OpenCL ML integration into TVM is open-sourced and up-streamed as part of the TVM repo.

Next steps
Our OpenCL ML integration with TVM is on GitHub, waiting for you to try. We think you will enjoy the hardware-level performance of OpenCL ML combined with the high-level optimizations and flexible framework of TVM — all on the Adreno GPU.

Besides publishing this first version of the integration into the open source TVM compiler framework, we plan to contribute performance improvements and feature updates back to the TVM community. Any newer versions of the Adreno OpenCL ML SDK will also be similarly updated and integrated into the TVM community. We are answering questions about the integration in our support forum. Watch this space for updates!


Snapdragon, Qualcomm Adreno, and Qualcomm Neural Processing SDK are products of Qualcomm Technologies, Inc. and/or its subsidiaries.