Snapdragon and Qualcomm branded products are products of
Qualcomm Technologies, Inc. and/or its subsidiaries.
Remember when I posted that that our Qualcomm® Robotics RB3 Platform wasn’t very pretty, but it was very smart? Sahaj Sarup, application engineer at 96Boards, proved that by building a seeing, hearing robotic arm around the RB3. This post from Sahaj recaps noteworthy steps on his software and hardware path to getting a 6-degrees-of-freedom (6DOF) arm to work with voice and computer vision (CV). Let’s hear from Sahaj...
What’s the best part of building a robotic arm? Going beyond robotics to make an arm that can also see and hear.
My goal was to build a demonstrative use case for the Qualcomm Robotics RB3 Development Kit. I chose to build a robotic arm, then I added OpenCV so that it could recognize objects and speech detection so that it could process voice instructions. To get 6DOF, I connected the six servomotors in a LewanSoul Robotic Arm Kit first to an Arduino board, then later to an I2C-based servo controller.

Here’s an overview of how you can do it.
Vision and machine learning for a robot
First, I needed the arm to recognize shapes, detect colors and determine its own position relative to an object, then pick up the object. That meant computer vision and machine learning. Also, I wanted to use a multi-threading library to spread those workloads across multiple CPU cores.
The Qualcomm Robotics RB3 kit is built around the DragonBoard™ 845c development board, based on the Qualcomm® SDA845 SoC and compliant with the 96Boards open hardware specification. I looked at what it would take to implement OpenCV on the Qualcomm Robotics RB3 kit. As an open-source computer vision software library that is often used to provide visual inference for machine learning applications, OpenCV makes it easy to modify the code so the RB3 can see and infer. Fortunately, between the board’s close-to-mainline Linux kernel running on Debian Buster, and the straightforward OpenCV installation, there’s almost nothing you need to modify.
But I found that OpenMP, my preferred library for multi-threading, wouldn’t work with OpenCV 4, Python 2 and 3, and Arm64. So I chose OpenCV 3.2 version instead, and managed to get all but one of the CPU cores running at 50 percent utilization or lower.
Detecting shapes and colors
Want to see my OpenCV code for tracking objects and detecting shapes and colors? A detect_shape function maps out edges and estimates the number of vertices; based on that, the shape is either a triangle, square/rectangle, pentagon or circle. A detect_hsv function allows the RB3 to detect color by separating the HSV color space. (It also gives the x- and y-coordinates for each object detected, which will help guide the robotic arm.) And an overlay function positions the data returned by detect_hsv over the frame as text, as show in the image.

(Like it or not, OpenCV defaults to 640 x 480 pixels on most webcams. If you’re a glutton for punishment, you’re welcome to try implementing 1080p for corner-case advantages provided by high resolution frames. I still like to use such high resolution cameras as most of them are accompanied by useful features such as fast auto-focus, automatic white balance and color correction. I’ve posted notes and workable code blocks for a 1920 x 1080-pixel video stream at 30 frames per second, along with options you can use in imutils and Gstreamer. Knock yourself out.)
Speech recognition meets computer vision on the RB3
Finally, I wanted to activate the Qualcomm Robotics RB3 kit with an oral command like “Hey, July” (or “Hey, Dum-E,” if you’re into that whole Tony Stark thing). Next, I would give it another oral command like “Pick up the blue rectangle.” It would then run speech recognition on the desired action (pick up), along with the color (blue) and shape (rectangle) of the intended object.
I chose a simple language processor to diff voice input against stored lists. Also, I wanted a speech detector script that uses Google’s Web Speech API to identify words spoken by the user.
It turns out that you need to import a few libraries to make this all work:
- JSON — You have to parse data in JSON format to share lists over memcached. That’s because memcached can handle string values only.
- pymemcached — A data-caching and -sharing front end for python
- speech_recognition — A collection of speech recognition libraries
- difflib — Mostly for diffing strings but used here for basic language processing
The Qualcomm Robotics RB3 kit runs the main Python script, the script for shape detection, and memcached — all separately from one another. That allows the speech detector voice script to see the x- and y-coordinates of all the objects detected by the OpenCV script. And JSON is there to convert lists to strings and back to JSON for memcached.
Okay, but what does it do?
As shown in the video, the camera detects and classifies objects placed on a table. Next, I issue a voice command to pick up one of the objects. Then, the robotic arm works with the camera to track and pick up the object.
Sure, any two-year-old child could do that. But no two-year-old child could pull together such a neat hack. The project is a good starting point for additional functions, an ample source of prototype code and a good initiation to the RB3 and the 96Boards ecosystem.
Your turn
Ready to build your own robotic arm with voice and vision? I’ve published an overview-post describing the hardware and software (and wetware) that went into the robotic arm, including the bill of materials and project objectives. You’ll also find a series of detailed posts with code you can examine and explanations of the design choices I made.
Want to one-up me? Try porting Robot Operating System (ROS) to the Qualcomm Robotics RB3 kit or switching from OpenCV on CPU to TensorFlow on the Qualcomm® Hexagon™ DSP.
Send me questions and let me know how your project is going!
Qualcomm Hexagon, Qualcomm Robotics and Qualcomm SDA845 are products of Qualcomm Technologies, Inc. and/or its subsidiaries.