A Four-Phase Strategy for Robotic Vision Processing, Part One

Thursday 5/16/19 09:00am
Posted By Dev Singh
  • Up0
  • Down0

The Evolution of Vision in Robotics

First-generation consumer robots, such as the first robotic vacuum cleaners introduced to consumers, were relatively simplistic and had limited abilities to self-navigate and perform tasks. They detected their surrounding through methods like barriers with infrared transmitters, and shock sensors to detect collisions. But that has all changed.

With advances in converging technologies like artificial intelligence (AI), machine learning (ML), and computer vision (CV), robots can now see their surroundings, analyze dynamic scenarios or changing conditions, and make decisions. These capabilities are being driven by hardware innovations such as increasingly powerful and intelligent mobile platforms, more sophisticated sensors, and high-resolution image capture.

With these resources at our disposal, developers can now focus on smarter and more autonomous robotics that are less reliant on external hardware (e.g., GPS), work in a larger number of locations than ever before (e.g., indoors, under low light, etc.), and can deal with changing environments and moving objects. This is paving the way for new robotic applications in areas like industrial IoT (IIoT), warehouses and logistics, retail, automotive, agriculture, health, enterprise, and consumer.

To achieve these goals, there are three primary robotic vision challenges that robotic developers should strive to overcome:

  • Determining the orientation of objects: Objects in the surrounding environment must not only be identified, but their orientation in 3D space must also be determined in order for a robot to interact with and/or avoid these objects.
  • Dealing with moving objects: Objects in a given environment may not be static. This means robots need to detect, identify, and track objects over space and time.
  • Navigating: For a robot to be autonomous, it needs algorithms that allow it to move within environments that may change over time.

A Four-Phase Strategy Forward

Depending upon the requirements, developers can overcome these challenges by employing a four-phase strategy, or some variation as follows:

  1. Preprocessing: Data is collected from the real world (e.g., from sensors and cameras) and converted into a more usable state.
  2. Feature detection: Features such as corners, edges, etc. are extracted from the preprocessed data.
  3. Object detection and classification: Objects are detected from the features and may be classified according to known feature maps.
  4. Object tracking and navigation: Objects that have been identified are tracked across time. This can include both objects and changing viewpoints of the environment as the robot navigates.

The data generated by these phases can then be used to control servos, make decisions, and perform other high-level robotic tasks.

This may sound like a lot of work–and indeed it can be–but thankfully there are frameworks and hardware to help you with this. Qualcomm Technologies, Inc. recently announced the Qualcomm® Robotics RB3 Platform (RB3) based on the Qualcomm® SDA 845 SoC (SDA845), and its associated Qualcomm® Robotics RB3 Development Kit. This kit provides developers with the mobile hardware capabilities and rich tool support to work towards these goals.

In part 1 of this two-part series on machine vision for robotics, we will look at the first two phases of the strategy: pre-processing and feature detection, and then see how they can be applied using a feature-rich development kit like the Qualcomm Robotics RB3 development kit.


A robot collects data from the real world using one or more cameras and/or other sensors. However, this raw data may not be in a suitable state for the accurate calculations and predictions required to meet established goals. Here, methods such as digital signal processing (DSP) can be used to “clean” the data to get it into a more usable form. Image data for example, can be cleaned in numerous ways including resizing, gamma correction, and contrast enhancement; while sensor data, such as that from the inertial measurement unit (IMU), accelerometer, barometers, and/or microphones on the Qualcomm Robotics RB3 development kit can be fused, interpolated, and/or filtered.

When it comes to image data, be sure to plan how much and how fast you want to collect it. The Qualcomm Robotics RB3 development kit can support two (stereo) images, which means your system must process two planes at the same time. It can also support various resolution configurations ranging from 16 to 32-megapixels, and frame rates ranging from 30 to 60 fps. Similarly, sensor data can be collected at various frequencies and bit rates using the high and low-speed connectors found on the Qualcomm®SDA845 and the types of sensors you choose.

To reduce the overhead of processing all of this data, you’ll generally want to use the lowest sample rate and resolutions that provide the required amount of data for your application. Furthermore, you should offload this processing to a suitable processor when possible. The Qualcomm SDA845 is compatible with specialized hardware including the Qualcomm® Hexagon™ 685 DSP and Qualcomm® Spectra™ 280 ISP, in addition to the more generalized Qualcomm® Kryo 385 CPU and graphics-oriented Qualcomm® Adreno™ 630 GPU.

On the API side, developers can use the Qualcomm® Computer Vision library, which contains a number of hardware accelerated APIs for image pre-processing. They can also use the Qualcomm® Neural Processing Engine SDK, which has a few image-preprocessing APIs for dealing with images in neural networks. Developers also have the option of using the Qualcomm® Snapdragon™ Heterogeneous Compute SDK for additional control over how compute operations are performed.

Feature Detection

With clean data available, features can then be extracted. Four common features that computer vision developers look for in visual data are:

  • Corners: a point-like feature with a local 2D structure
  • Edges: a set of points between two regions
  • Blobs: regions of interest
  • Ridges: a curve with a ridge point

Check out this Wikipedia article, which provides additional information about these features, and lists a number of feature detector algorithms and the types of features that they can detect. The image below shows a number of features that you might detect from visual data:

Feature detection algorithms can require a lot of processing power but generally operate on a pixel-by-pixel basis which makes them suitable for parallel execution on the different processors of the Qualcomm SDA845. Developers can use the feature detection APIs found in the Computer Vision library, which include Harris Corner Detector, FAST, Hough Transform, and other detectors, as well as the library’s Maximally stable extremal regions (MSER)-based object detection APIs.


These first two phases provide a strong foundation for vision processing in robotics. Pre-processing gets the data into a usable form, while Feature Detection begins the process of understanding that data. In Part 2 of this two-part series, we’ll look at how the final two phases: Object Detection and Classification, and Object Tracking and Navigation, provide robots with the data necessary to navigate and interact with their surroundings.

Developers who are interested in moving forward with next-generation robotics, can purchase the Qualcomm Robotics RB3 Developer kit from Thundercomm and start taking advantage of the various SDKs available on QDN.