Using High Performance Vision Processing for IoT

Wednesday 5/9/18 03:00pm
Posted By Shardul Brahmbhatt
  • Up0
  • Down0

Snapdragon and Qualcomm branded products are products of
Qualcomm Technologies, Inc. and/or its subsidiaries.

A myriad of mobile devices today feature built-in cameras that, when paired with onboard processors, provide intelligent image capture and vision processing capabilities. These capabilities are being driven by advances in Computer Vision (CV) and Deep Learning (DL) which can rely heavily on information that comes from a camera. Developers that adopt computer vision combined with deep learning at the edge, versus performing these tasks in the cloud, often gain a number of advantages in their IoT projects. For example, security and latency issues can be reduced by limiting the amount of data to be sent to the cloud. Efficiency can also be increased by using triggers on the IoT device to determine what data it can process locally and what data must be sent to the cloud.

Recently, Qualcomm Technologies, Inc. announced our Vision Intelligence Platform, which was purpose-built to facilitate edge and visual computing for Machine Learning on IoT devices powered by our Qualcomm QCS605 and Qualcomm QCS603 SoC. We wanted to share this announcement with our developers and provide an example of how to use it to expand the possibilities in your projects.

Organizing a Project

As DL can require high computational requirements, it’s important to organize your project for high accuracy as well as efficiency. Here is an example using two different techniques:

Technique One (separate CV and DL):

Here, a large buffered image is divided into five different images, one is the whole image downscaled, and the other four are divided based on the image quadrants. DL is used for detection and classification, and then all five images are combined and tracked using CV. When running this in our tests using MobileNet SSD, the process improved detection and tracking for distant images, and when divided, was able to track the images at regular intervals.

Technique Two (CV+DL):

This technique uses zoomed-in detection involving a combination of five images, where the smaller images are used for detection, classification and tracking. Since images often have a significant amount of empty or open space, CV+DL allows for monitoring just the region of interest (e.g. a face or specific object such as the ball). We found in video capture testing that if only every fifth or sixth frame was monitored, then detection and tracking, even at a distance, was improved by two times.

With our Vision Intelligence Platform, the process for the CV+DL example was optimized due to the heterogeneous compute architecture of our SoC. In particular, the ability to use the DSP for learning at the edge was made possible by our Hexagon Vector Processor, which allows higher utilization across the SoC for improved performance.

The importance of image classification should not be underestimated. Consider the processing involved in the example above, but in a different use case such as a traffic command center that monitors 40,000 video feeds of traffic, pedestrians, vehicles, licence plates, collisions and even sound. With so many elements being tracked in real time, accurate object detection is imperative to ensure that only the objects of interest generate alerts for decision making.


As our test examples show, our Vision Intelligence Platform provides a strong foundation for developer opportunities in IoT. It integrates the Qualcomm AI Engine and software such as the Qualcomm® Snapdragon® Neural Processing Engine (NPE), that support on-device AI which boasts analysis, optimization, and debugging capabilities that help developers and OEMs port trained networks into the platform. The AI Engine is compatible with TensorFlow, Caffe and Caffe2 frameworks, Open Neural Network Exchange interchange format, Android Neural Networks API, and the Qualcomm Hexagon Neural Network.