With advances towards semi and fully-autonomous machines such as robots, drones, and passenger vehicles, the need for precise and reliable path finding and obstacle avoidance has become a huge focus for developers.
Back in part 2 of our Robotics Vision blog, we briefly introduced simultaneous localization and mapping (SLAM) which are techniques for mapping an unknown environment and tracking a machine’s location within that environment. The "L" in the acronym means to estimate the current location of the machine, while the "M" means to map the surrounding environment.
The key goals of SLAM are to augment path planning and navigation, and to ultimately avoid collisions with other objects as a machine travels through its environment. Given these goals, a good SLAM solution is considered a fundamental requirement for autonomous machines such as self-driving cars.
In this blog, we’ll briefly look at traditional approaches to SLAM, and then focus on how advances in machine vision are being used to provide richer, machine vision-based SLAM techniques.
Existing Tracking Approaches and Challenges
There have been a number of tracking solutions over the years, such as those that rely on external helpers (e.g., markers) to assist in localization and mapping. SLAM seeks to solve a slightly different tracking problem, namely, localization and mapping in an unknown, and potentially dynamic environment. That means it cannot rely on external helpers.
SLAM can be implemented in many ways depending on requirements such as the desired level of precision, hardware availability, and others. In the past, there have been a number of well-known approaches and technologies employed but each has their limits. For example:
- GPS can provide localization, but only works outdoors. Also, GPS can be inaccurate and blocked by large objects such as buildings.
- Tactile sensors can "feel" the surrounding environment, but only in the area near the machine.
- Tracking using odometry, such as wheel rotation counts, can be unreliable due to environmental conditions such as slippery surfaces.
Other challenges that developers also contend with include sensor drift, environmental effects on sensors, extremely complex environments, calibration/recalibration needs, and "kidnapping" (e.g., when the device is relocated by someone). On top of this, there are project-level requirements to satisfy such as costs, size and weight restrictions, processing power, and safety considerations.
Localization and Mapping: A Chicken and Egg Problem
With SLAM, we often talk about localization as estimating the machine’s pose, and mapping as determining the environment’s structure. Together these form the outputs of a SLAM system.
The pose, or pose estimation, refers to calculating the position and orientation (i.e., yaw, pitch, and/or roll) of the machine within an x-dimensional environment.
The purpose of the pose is to establish spatial relationships between the machine and its environment, so an up-to-date pose is an important input to mapping. The orientation is necessary for establishing the machine’s heading and can be based on variables such as accelerometer and gyroscope readings. The pose is also important for estimating the locations of surrounding features so that the machine can keep track of the environment’s state as it changes.
Structure involves mapping the surrounding environment and/or objects. This starts by defining the origin of the world coordinate space around which to build the map. Developers often set the origin to the machine’s initial position or some environmental feature, and once this has been identified at runtime, mapping can begin.
There are many different mapping technologies and techniques that can be used including LIDAR and point clouds. As well, mapping is often done incrementally as additional data is collected over time. Since sensors can provide noisy information, a process called loop closure is used to build new maps over time to help correct for issues such as sensor drift.
With SLAM, pose estimation often depends on knowing the current structure mapping and vice versa, so SLAM is sometimes described as a chicken-and-egg problem. To resolve this circular dependency, many SLAM methods have been devised, with one of the more common approaches being the Kalman Filter.
Input to a Kalman Filter can include the machine’s location, actuator input, sensor readings, motion sensors, etc., while the outputs are a pose estimate and a feature map. For SLAM, an Extended Kalman Filter (EKF) is typically used as it is more capable than a regular Kalman Filter for handling the complex, non-linear behaviors such as those found in unfamiliar real-world environments.
SLAM is generally recursive in nature because the machine is constantly alternating between observing and moving. During observation, the machine estimates feature locations to determine the environment’s state. It does this using known information to increase its confidence that its knowledge of the current state is correct. While moving, the machine gains new vantage points that decrease this confidence, thus new observations are required to update its understanding of the state.
SLAM must also deal with re-localization, which is the process of recovering from lost state (e.g., due to kidnapping where the machine is picked up and placed elsewhere). Recovery requires that the machine be placed in a part of the environment that it has visited before, so it can identify its location from previously collected data.
Visual Localization and Mapping
With converging advances in digital camera technology, AI, machine vision algorithms, sensors, and raw processing power, Visual SLAM (vSLAM) has become a popular approach to SLAM.
vSLAM relies on visual input from one or more cameras. This can be done with monocular or stereo images and applying techniques such as image recognition and/or depth sensing.
vSLAM also incorporates machine vision techniques including preprocessing data, feature detection, object detection and classification, and object tracking/navigation. Such techniques can augment vSLAM with new ways of identifying features and new functionality such as tracking moving objects.
Visual Inertial SLAM (VISLAM) further extends vSLAM, by combing data collected from various inertial measurement units (IMUs) to provide 6DoF pose data for a machine.
Bringing vSLAM to the Real World
vSLAM, and VISLAM in particular, are cutting edge SLAM approaches that can provide intelligent localization, environment mapping, and navigation.
Some key considerations for implementing vSLAM and VISLAM approaches can include:
- Choosing the type of imagery and detail to process, and the frequency at which to analyze it.
- Deciding on a vSLAM/VISLAM algorithm.
- Assessing hardware requirements such as the processing power to handle the compute intensive requirements of vSLAM and VISLAM, RAM and storage needs for feature/map data, and the types of sensors to incorporate.
- Deciding on what to use as the origin of the world coordinate space.
Qualcomm Technologies, Inc. (QTI) offers a number of products for implementing VISLAM solutions. This includes the Qualcomm® Machine Vision SDK which includes a number of useful features for VISLAM, notably an implementation of an EKF to fuse IMU and camera tracking data for 6DoF pose estimates, and support for 3D point clouds for tracked feature points. QTI also offers the Qualcomm® Computer Vision SDK which includes a rich set of computer vision functions including object detection, motion and object tracking, and various math and vector operations. And of course, the Qualcomm® Neural Processing SDK can be used to perform AI algorithms such as on-device neural network inference. These SDKs are all designed to be run on Qualcomm® Snapdragon™ mobile platforms such as that found on the Qualcomm® Robotics RB3 Platform. Together these products provide a comprehensive solution that utilize hardware features with tight integration, while abstracting its intricacies.
So now that you know more about SLAM, vSLAM, and VISLAM, check out our VISLAM flight example to see this technology in action.
- SLAM is also being used in Extended Reality Experiences to help users avoid walls and other objects, and to improve rendering accuracy.
- See our sensor blog for more information about the use of sensors.
- Some of these challenges have led to the development of technologies such as LIDAR and Sonar.
- Usually computed in parallel (e.g. separate threads), hence the term "Simultaneous" Localization and Mapping.
- "State" refers to the features detected in the environment at any given moment.
Qualcomm Snapdragon, Qualcomm Neural Processing SDK, Qualcomm Robotics RB3, Qualcomm Computer Vision SDK, Qualcomm Math Library and Qualcomm Machine Vision SDK are products of Qualcomm Technologies, Inc. and/or its subsidiaries.