Qualcomm products mentioned within this post are offered by
Qualcomm Technologies, Inc. and/or its subsidiaries.
In our quest to simplify heterogeneous programming for you, we’ve released a C++11/14 compiler and libraries for convolutional networks. Download the Qualcomm® Hexagon™ SDK 3.1 to start working with them.
A lot of modern apps developed for the CPU are written in C++11/14, and most of you are comfortable in that environment. Instead of having to port your algorithms to straight C language to adapt them to the DSP, you can use the C++ compiler in Hexagon SDK 3.1 to take advantage of the DSP without modifying your source code.
Now that Hexagon SDK 3.1 supports C++11/14, you can parse your existing code into sections that can execute as modules, and then evaluate their performance on the DSP. You can incorporate the segmented code into the CPU-offload example template, then compile it using the DSP’s C++ compiler. In some situations, it might make sense to migrate the entire code base to the DSP. With the C++11\14 compiler, you shouldn’t need any recoding, and you should be able to make minimal source code changes to recompile on the DSP.
That can open up a lot of opportunities if you work on compute-intensive applications like computer vision and machine learning. You know that heterogeneous programming is your ticket to decent performance on mobile hardware, but you don’t look forward to recoding your algorithms from the comfortable environment of C++ to C.
With Hexagon SDK 3.1 you don’t have to.
A CPU-Like Programming Model For The Hexagon DSP
We realized that some developers get tripped up by the programming model behind heterogeneous computing. They aren’t sure how to separate their program to parse an algorithm between the CPU and DSP.
The Hexagon DSP is designed to simplify that because it acts as more of a peer to the CPU.
DSPs have traditionally been deeply embedded with a very small memory space, so they couldn’t accommodate the memory requirements of C++. But on soon-to-be-released devices powered by Qualcomm Snapdragon™ 835 processor, the compute DSP is engineered to have a memory management unit (MMU) in L2 cache and full access to double data rate (DDR) memory that can accommodate larger program and data sizes.
That means a CPU-like programming model with threads and a cache. The latest Snapdragon SoCs DSPs can execute out of DDR, as a CPU does, and get around the memory limitations that characterize traditional DSPs. The Hexagon RTOS has also added the necessary Posix thread conventions to support the C++ programming model. Hexagon can still operate in non-cache mode in local memory for low-power tasks like low-level sensor functions. But executing out of DDR makes more sense for the algorithms and large data sets in use cases like computational camera, computer vision and machine learning. It’s easier to program and compile in C++11/14 than to port your algorithms to C to run on the DSP. The beauty of this approach is that you can mix cached execution with DDR based execution models depending on your code requirements.
Move Entire Frameworks From The CPU To The DSP
In fact, with C++11/14 support in Hexagon SDK 3.1, you can move an entire framework that uses C++ - think OpenCV & OpenVX, TensorFlow, Caffe or Torch - to the DSP and isolate it from the CPU from the perspective of concurrency and power consumption, if desired by the system. Otherwise, selected high-performance segments of code can be migrated, while execution of overall framework code stays on the CPU.
Take Tensorflow, an open source library often used in machine learning, or OpenCV & OpenVX, used in computer vision. The usual implementation is to migrate isolated libraries to the DSP, then have the algorithm on the CPU tell the DSP when to run those libraries. That gets the compute-intensive tasks off the CPU and boosts performance, but what if you’re developing an always-on application? As new data comes in, the app wants to analyze it constantly, so the CPU has to stay lit and running all the time for almost no usage.
For an always-on application, it’s better to move the entire app and framework to the DSP. With C++ support, frameworks like Tensorflow and OpenCV can run on the DSP in the background with the CPU suspended. Without C++ support, some code needed to be active on the CPU, which would consume power. Either that, or the code would have to be modified to a C structure, which can be costly as well as difficult to support an additional code branch over time.
In the past, Hexagon could act only as an accelerator; now it is designed to accommodate an entire framework on the DSP. You can move an algorithm or an entire framework onto the Hexagon DSP and compile it there without having to recode extensively for a heterogeneous programming model.
Get A Head Start
Download the Hexagon SDK 3.1 now and see how you can move a wider variety of code traditionally run on the CPU to the DSP, without the need to rewrite all of your code. We think this will shorten the development process and reduce your porting effort.
The SDK is also your chance to get a head start on working with the Hexagon DSP, which will be in devices expected to be commercially available in the second half of 2017. You can develop your algorithms for image processing, computer vision and machine learning in C++11/14 and run them on the included simulator until handsets with the Snapdragon 820 and 835 are available.
Let me know in the comments below what else you need to get started.