Neural Networks for Computer Vision and Natural Language Processing

Wednesday 11/17/21 08:41am
Posted By Hsin-I Hsu
  • Up0
  • Down0

Snapdragon and Qualcomm branded products are products of
Qualcomm Technologies, Inc. and/or its subsidiaries.

Cooking with Snapdragon is a video on which we recently collaborated with WIRED Brand Lab, and published on The video features exciting yet practical use cases for AI at the edge. The technology used includes object detection for recognizing objects, natural language processing (NLP) for language translation, and sound sensing for identifying an alarming sound. Be sure to check out the video and accompanying article!

Implementing technologies like these requires models called neural networks. Developers use many frameworks to create including TensorFlow, and in some cases, frameworks which share data using ONNX, a common model interchange format.

Snapdragon® mobile platforms are well-positioned to help developers use these frameworks to deliver on a variety of AI use cases thanks to the Qualcomm® Artificial Intelligence (AI) Engine. The AI Engine refers to the rich features of the Qualcomm® Hexagon™ DSP like its fused AI-accelerator architecture and the new Tensor Accelerator found in the Hexagon 780 Processor. This technology is backed by a full stack of AI developer resources from QDN including:

Let’s take a look at some common machine learning (ML) models and features supported by the AI Engine and its tools for implementing common AI-powered use cases.

Object Detection

In the video we saw how Snapdragon was used to classify and identify a jackfruit and determine whether it’s ripe enough for cooking.

This use case is just one example of the many image processing algorithms that developers can implement on Snapdragon. Image processing algorithms fall under a number of categories including:

  • Image Classification: determines what an image represents (e.g., a face, vehicle, building, etc.). Use cases include: analyzing the state of crops, detecting environmental conditions from satellite imagery, classifying objects on an assembly line, etc.
  • Object Detection and Tracking: locate and identify one or more types of objects in an image, and even track those objects across multiple images (e.g., video frames). Use cases include: analyzing security footage, facial recognition, tracking moving objects in autonomous vehicles etc.
  • Semantic Segmentation: identifies one or more types of objects in an image on a per-pixel basis, allowing for fine-grained object detection. Use cases include: highlighting abnormalities in medical imagery, recoloring objects in pictures, etc.

One of the most common neural networks used for image processing is the Convolutional Neural Network (aka CNN). There are various forms of CNN architectures like LeNet, AlexNet, etc., each suited for different types of image processing tasks.

CNNs work off the same general principle of convolving an input image using filters (aka kernels) to extract one or more feature maps. These feature maps are then iteratively down-sampled through pooling and passed to the next layer to extract higher-level features. The final down-sampled (pooled) feature map is then flattened and used as input to a fully-connected neural network for classification.

Our Neural Processing SDK supports a number of useful CNN layers including Convolution, Deconvolution, Depthwise convolution, Fully connected, and Pooling. For a more detailed list, check out the Supported Network Layers page of the API reference guide.

The Computer Vision SDK is designed to help developers implement new user experiences with gesture recognition, face detection, tracking, and recognition, text recognition and tracking, and even augmented reality. The SDK includes a rich library of math routines for these applications, optimized to run on Snapdragon platforms.

Natural Language Processing

The video showed two great examples of on-the-fly NLP. In the first example, a paper-based menu was translated from Spanish to English and in the second example, spoken words were translated in real-time. NLP is an exciting area and can span a wide variety of tasks including:

  • Speech/Voice Recognition: understanding spoken speech. Use cases range from smart speakers to voice activated industrial IoT systems.
  • On the-fly Translation: interpreting speech in one language and translating it to another language, all in near real-time. For example, this can allow verbal communication between online meeting participants who speak different languages.
  • Semantic Analysis: understanding the sentiments of speech such as positive or negative statements.

The NLP page on Wikipedia provides a great description of these and other types of NLP tasks.

Common ML models for NLP applications are Recurrent Neural Networks (RNNs) like Long Short-Term Memory (LSTM). They’re well suited for NLP due to their ability to remember values over arbitrary time intervals and to regulate flows of information. Their most prominent feature is the looping capability found in their recurrent layers. Here, the input to each layer consists of both the data to analyze and the output from a previous calculation performed by that layer, providing them with memory (i.e., the ability to maintain state across iterations), while their recursive nature is well-suited for sequential data like the series of words found in speech.

The Neural Processing SDK supports LSTM layers, which you can learn more about on the Supported Network Layers page.

Interestingly, NLP can sometimes intersect with image recognition. For example, this use case by PerceptiLabs, shows how a MobileNet (another type of CNN) can be used to analyze spectrograms to detect voice commands for applications like voice user interfaces (VUIs).

Sound Sensing

The video also showed how the Snapdragon can perform sound sensing to detect very specific sounds. In the video, the sound of the overflowing pot was detected by Snapdragon which then notified the users via an alarm. Such functionality can be achieved through a variety of techniques including the spectrogram analysis approach via CNNs that we mentioned above.


AI-powered, on-device intelligence at the edge is well supported by our ecosystem of SoCs and tools. In particular, image recognition and NLP are two key areas which support many practical use cases today.

If you’re just getting started, be sure to check out the following tools and SDKs from QDN:

Also be sure to read these two recent blog posts on QDN, which dive deeper into our AI offerings:

And finally, see how two Qualcomm® Advantage Network (QAN) members are using the Hexagon DSP to power inference on edge devices:

For additional inspiration on what you can build by tapping into our ecosystem of AI solutions, be sure to check out our Projects page. And if you’ve built something cool or interesting using any of our technologies, be sure to tell us about it and we may feature it on QDN.

Snapdragon, Qualcomm Artificial Intelligence Engine, Qualcomm Neural Processing SDK, Qualcomm Hexagon, Qualcomm Computer Vision, Qualcomm Machine Vision, Qualcomm QCS410 and Qualcomm QCS610 are productsof Qualcomm Technologies, Inc. and/or its subsidiaries. AIMET is a product of Qualcomm Innovation Center, Inc. Qualcomm Advantage Network is a program of Qualcomm Technologies, Inc. and/or its subsidiaries.