Improving VR Performance Using Motion Estimation OpenGL Extensions

Tuesday 2/16/21 10:01am
|
Posted By Jonathan Wicks
  • Up0
  • Down0

Snapdragon and Qualcomm branded products are products of
Qualcomm Technologies, Inc. and/or its subsidiaries.

Co-written by Jonathan Wicks and Sam Holmes

XR presents many challenges, with its high framerate requirements, high resolution, and low latency, all within a small power and thermal budget. One foundational technology that can be used to counter the workload increase in such an environment is motion estimation – the process of calculating the movement of objects across frames of recorded or rendered images.

Motion estimation (aka optical flow estimation) techniques analyze the perceived motion across frames of images and captures these movements as motion vectors. Motion vectors are often provided with a particular granularity, for example, one motion vector for every 8x8 block of pixels.

In VR, motion vectors are used with techniques such as Asynchronous Spacewarp (ASW) or Motion Smoothing to create extrapolated frames, which are displayed in place of rendered content, allowing the application to render at a lower framerate.

In addition to Asynchronous Spacewarp, other common uses for motion estimation include video compression, depth from stereo, structure from motion, image alignment, motion blur, and object tracking.

The figure below shows a visualization of motion vectors calculated for a block of pixels across two frames of video:

Example of motion vectors calculated for a block of pixels between two image frames.

The Qualcomm® Adreno™ 6xx GPU, in our Snapdragon® mobile platform, like the Snapdragon 865 mobile platform, brought the addition of an accelerated texture unit capable of efficient and high-performance hardware-accelerated block matching. To help developers take advantage of this in VR, games, and other apps on Snapdragon devices, our VR team created an OpenGL vendor extension called QCOM_motion_estimation.

Qualcomm Technologies, Inc. is a member of the Khronos Group, a non-profit consortium that manages OpenGL and other open-standard APIs. You can learn more about the Khronos Group and their XR-specific API by reading our OpenXR and an Introduction to the Khronos Group blog on QDN.

Overview of the Extension

The QCOM_motion_estimation extension adds two API functions to OpenGL. These functions take in reference and target images (textures) representing the two image frames from which to calculate motion. They then populate an output texture containing the corresponding motion vectors. These functions have the following signatures:

  void TexEstimateMotionQCOM(uint ref,
                             uint target,
                             uint output) ;
  void TexEstimateMotionRegionsQCOM(uint ref,
                                    uint target,
		                    uint output,
				    uint mask) ;

We designed the functions to have a flexible output format. Motion vectors are written into a filterable GL image format (GLRGBA16F) that optimizes them for further consumption in other OpenGL functions and helps avoid the need to convert to other formats. The R and G channels for each pixel in the output texture contain the X and Y magnitudes, respectively, while the B and A channels are reserved for future use. Subpixel refinement is currently provided using 16-bit floats.

TexEstimateMotionRegionsQCOM() extends TexEstimateMotionQCOM() by taking in an additional stencil mask parameter. This is a texture defining a region of interest for which motion vectors are to be calculated. The stencil mask can be used to reduce workload on a per-frame basis. For example, this can be useful when you don't want to analyze a whole scene, such as when the locations of certain objects in the last frame are known. The stencil mask is also useful for foveated rendering so that the workload can be optimized to deal with the important regions.

Typically motion estimation is performed on single-channel luminance textures, as in this extension. Apps can convert the reference and target textures using the rgb_2_yuv() shader function from the EXT_YUV_target extension.

After the functions have generated the motion vector texture, developers can use it for their particular use case (e.g., Asynchronous Spacewarp, compression, etc.). Developers can also render the vectors (e.g., as an overlay) to help visualize the movements and can even use the texture for 3D operations like mesh perturbing.

Performance and Accuracy

In conjunction with the underlying block-matching hardware acceleration provided by Adreno 6xx, the extension has been optimized for true motion estimation (TME) with maximum GPU performance and low latency. The work can be in-lined with other GPU workloads without additional overhead, while CPU commands can be hidden behind on-going GPU work, and the motion-estimate process can be performed just-in-time. For typical VR frame resolutions, the motion vectors can often be generated in under 1ms for both eyes, allowing the vectors to be created on-demand. The block-matching hardware also operates in a hierarchical fashion for speed and greater search range.

Evaluating the accuracy of a motion estimation system is often achieved by calculating the average end-point-error (average EPE). Average EPE compares each estimated motion vector (Vest) with a ground truth motion vector (Vgt) to determine its Euclidean distance: ||Vest – Vgt||. After the end-to-end point error for each motion vector is determined, an average error (i.e., the average EPE) is then calculated to determine the final level of accuracy.

These calculations require pre-existing ground truth motion vectors, typically based on frames with annotations describing the real known motion, such as the Middlebury Optical Flow Dataset. Using the QCOM_motion_estimation extension available on Adreno 6xx, with the ground truth from the two-frame Middlebury data set, our EPE score is 0.91. Runtime for this data set is under 0.5ms per frame pair.

Conclusion

Motion estimation encompasses powerful techniques that seek to calculate the perceived motion of objects across frames of recorded or rendered data. Using this information, developers can solve a variety of problems ranging from compressing video to smoothing out jitter in VR.

Interested in learning more about technologies from Qualcomm Technologies that can be used for VR? Be sure to check out the following resources:

For additional information be sure to sign up on our XR Solutions Page to receive communications about upcoming announcements.


Snapdragon, Qualcomm Adreno, and Qualcomm Computer Vision Suite are products of Qualcomm Technologies, Inc. and/or its subsidiaries.