Reducing Rendering Work and Memory Operations in Stereoscopic Scenes – New Multiview Extensions for Vulkan

Thursday 10/26/23 05:30am
|
Posted By Jonathan Tinkham
  • Up1
  • Down0

Snapdragon and Qualcomm branded products are products of
Qualcomm Technologies, Inc. and/or its subsidiaries.

To render stereoscopic views in extended reality (XR) development, how do you treat each view differently and account for the difference in perspective? A common XR technique is to use multi-viewport for viewports and scissor out different regions of each eye, but multi-viewport ignores clears and is incompatible with other XR extensions. It can also prevent foveation and slow down the rendering work.

To help you squeeze as much performance as possible out of hardware, we’ve released multiview rendering extensions that can accelerate and simplify rendering of stereoscopic content on Qualcomm Adreno GPUs. The extensions allow simultaneous rendering of a scene to different image layers. They’re designed for chipsets like the Snapdragon XR2 platform, which supports hardware-accelerated, multiview rendering in both Vulkan and OpenGL ES. By duplicating draws in the GPU itself and facilitating the cache re-use of elements like vertices and texture data, the extensions can reduce CPU overhead and improve GPU performance.

This post explains the technical problem of rendering stereoscopic views efficiently and describes the solution. XR developers will find code samples that can help their applications skip unnecessary rendering work and memory operations.

Why it’s non-trivial to render stereoscopic content

Rendering stereoscopic scenes involves processing each view with stereo disparity to introduce the difference seen by each eye. Stereoscopic vision requires that the field of view of each eye will partially but not entirely overlap the field of view of the other.

In the context of XR, the result is that each view’s rendered scene is slightly offset from the other because of the asymmetric fields of view. The asymmetry means that shared portions of a scene are more likely to be separated during rendering and less likely to be rendered together. With less overlap comes less cache locality and fewer benefits from multiview rendering hardware.

Asymmetric multiview rendering
The images below depict that asymmetric multiview rendering. At the top, drawings show the left and right eyes, the field of view of each eye (blue broken lines) and the effective rendered area (horizontal green bars). At bottom are the rendered views, as seen by each eye.

The image below is the same scene, with the rendered views of both eyes mixed together. Red represents the view from the left eye and blue the view from the right eye. Where the rendering of the views’ overlaps, the red and blue blend as a color mix. The more purple, the better, because it shows the amount of overlap between the views.

However, by changing the rendering to a symmetric field of view, the effective disparity during rendering is reduced and the concurrent rendering gains can be realized.

Symmetric multiview rendering
You can reduce that disparity with symmetric multiview rendering by increasing the size of the rendered area of the scene. In the drawings below, the red bars and black broken lines show the increase in rendered area for each eye. At bottom are the rendered views for each eye; the inverted scene color on the inside of each image is the over-rendering.

Symmetric multiview rendering allows each view to be padded in such a way that the rendered scene is more closely aligned between left and right views. The image below shows symmetric multiview rendering of the scene, with the views from both eyes mixed together. Again, red represents the view from the left eye and blue the view from the right eye. The more purple, the better, because it indicates overlap. Over-rendered areas, in inverted colors, are left unchanged by the rendering.

The technique improves performance over asymmetric rendering because it uses multiview hardware more effectively. However, it introduces more GPU rendering work (shown in the areas of inverted color) that ends up unused and ultimately discarded.

New Vulkan extensions

It’s true that Vulkan allows you to exclude rendering work and memory operations through scissors and render areas respectively. But by default, those techniques are shared for all views, making it difficult to exclude work on a per-view or per-layer basis. Multi-viewport behavior in Vulkan supports the use of multiple scissors, but:

  • you have to modify shader code to take advantage of it
  • the behavior is not compatible with tile-based foveated rendering (exposed through the VK_EXT_fragment_density_map extension) on Adreno GPUs; and
  • core Vulkan has no way to limit, on a per-view basis, the memory operations related to clears, loads and stores

To address those issues and expose more versatile ways for excluding and reducing the work of multiview rendering, we’ve introduced two new extensions: VK_QCOM_multiview_per_view_viewports and VK_QCOM_multiview_per_view_render_areas. They are public, Khronos Vulkan extensions that XR developers can use with their headers and the Vulkan driver for the Adreno GPU.

Skipping rendering work: Per-view viewports and scissors
The VK_QCOM_multiview_per_view_viewports extension introduces a new device mode flag that will let applications set multiple scissors and viewports to be implicitly used for the corresponding view during rendering. For example, scissor[1] will be used for the second view automatically and scissor[0] will be used for the first view. That means that specifying per-view scissors and viewports will not require modifying shader code and will be compatible with tile-based foveation on the Adreno GPU.

With the extension, your applications can reduce the work of multiview rendering; one example is scissoring the over-rendered areas (shown in inverted colors) in the symmetric field-of-view approach above. Scissoring those areas avoids the vertex and fragment workloads in them, resulting in significant, potential performance gains.

In short, this extension uses per-view viewports and scissors to skip rendering work on a per-view basis.

As shown in the code block below, start by enabling the extension. Also, toggle the feature during device creation:

VkDeviceCreateInfo deviceCreateInfo = {};
std::vector deviceExtensions;

// Add per-view viewports extension to extension list
deviceExtensions.push_back(VK_QCOM_MULTIVIEW_PER_VIEW_VIEWPORTS_EXTENSION_NAME);

// ... (fill in device create info)

deviceCreateInfo.enabledExtensionCount = static_cast(deviceExtensions.size());
deviceCreateInfo.ppEnabledExtensionNames = deviceExtensions.data();

// Toggle the bit in the extension struct to enable behavior
VkPhysicalDeviceMultiviewPerViewViewportsFeaturesQCOM perViewPnext = {};

perViewPnext.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_MULTIVIEW_PER_VIEW_VIEWPORTS_FEATURES_QCOM;
perViewPnext.multiviewPerViewViewports = TRUE;

// Add to pNext chain
perViewPnext.pNext = deviceCreateInfo.pNext;
deviceCreateInfo.pNext = &perViewPnext;

// Create Device
result = vkCreateDevice(physicalDevice, &deviceCreateInfo, pAllocator, m_pDevice);

When using multiview rendering, set a series of scissors and viewports equal to the number of views. If the same scissor/viewport is to be used across views, be sure to duplicate it for all views. For example, the following code scissors work differently per view while rendering the same viewport:

VkViewport viewports[2];
VkRect2D scissors[2];

scissors[0].offset.x = 0;
scissors[0].offset.y = 0;
scissors[0].extent.width = 1640 - 200;
scissors[0].extent.height = 1440;

scissors[1].offset.x = 200;
scissors[1].offset.y = 0;
scissors[1].extent.width = 1640 - 200;
scissors[1].extent.height = 1440;

viewports[0].x = 0;
viewports[0].y = 0;
viewports[0].width = 1640;
viewports[0].height = 1440;
viewports[0].maxDepth = 1.0f;

viewports[1] = viewports[0];

vkCmdSetViewport(m_hCmdBuf, 0, 2, viewports);
vkCmdSetScissor(m_hCmdBuf, 0, 2, scissors);

The approach applies to both dynamic and static viewport/scissor pipeline states. The device-create flag is device-global and will apply across all command buffers and pipelines made with that device, so be sure to program all views’ worth of scissors and viewports.

(Note: VK_QCOM_multiview_per_view_viewports is compatible with standard multiviewport in Vulkan. Using the Layer/ViewportIndex shader built-ins in a pipeline will cause that pipeline, and only that pipeline, to override the implicit behavior of this extension).

Skipping memory operations: Per-view render areas
The VK_QCOM_multiview_per_view_render_areas extension addresses an area missing from core Vulkan and other extensions: the ability to specify multiple render areas.

Render areas are the regions that your application tells the Vulkan implementation it intends to render. While the application is responsible for enforcing that through scissoring, the implementation is responsible for operations in this region such as clears, loads and stores. A render area is specified once – at BeginRenderpass time – and applies to all attachments and layers, including all views when rendering with multiview.

The per-view scissors from VK_QCOM_multiview_per_view_viewports enable the implementation to skip vertex and fragment work on a view-by-view basis, as described above. However, there will still be unnecessary memory operations due to clears, loads and stores that can interfere with work-skipping strategies in the implementation and GPU hardware. VK_QCOM_multiview_per_view_render_areas allows your application to specify a unique, per-view render area. That enables the implementation to skip those memory operations and give more opportunity for work-skipping strategies.

For example, when you use per-view render areas with tile-based foveation on the Adreno GPU, the GPU can skip entire tiles’ worth of work. Plus, the Adreno GPU can reclaim the skipped tiles, resulting in significant workload savings and efficient use of tile memory beyond just skipping the unneeded memory operations.

In summary, this extension allows the implementation to use per-view render areas to skip memory operations (clears, loads, stores) on a per-view basis. It also allows the implementation to use per-view render areas with a fragment density map, enabling the Adreno GPU to potentially re-use skipped tiles for more efficient rendering.

To use per-view render areas, enable the extension during device creation, as shown in the code block below:

VkDeviceCreateInfo deviceCreateInfo = {};
std::vector deviceExtensions;

// Add per-view viewports extension to extension list
deviceExtensions.push_back(VK_QCOM_MULTIVIEW_PER_VIEW_RENDER_AREAS_EXTENSION_NAME);

// ... (fill in device create info)

deviceCreateInfo.enabledExtensionCount = static_cast(deviceExtensions.size());
deviceCreateInfo.ppEnabledExtensionNames = deviceExtensions.data();

// Toggle the bit in the extension struct
VkPhysicalDeviceMultiviewPerViewRenderAreasFeaturesQCOM perViewPnext = {};

perViewPnext.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_MULTIVIEW_PER_VIEW_RENDER_AREAS_FEATURES_QCOM;
perViewPnext.multiviewPerViewRenderAreas = TRUE;

// Add to pNext chain
perViewPnext.pNext = deviceCreateInfo.pNext;
deviceCreateInfo.pNext = &perViewPnext;

// Create Device
result = vkCreateDevice(physicalDevice, &deviceCreateInfo, pAllocator, m_pDevice);

Then, when calling vkBeginRenderpass (or vkBeginRendering), pass an array of render areas for each view to the extension struct:

VkRenderPassBeginInfo rpBeginInfo = {};

rpBeginInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO;
rpBeginInfo.framebuffer = m_hFramebuffer;
rpBeginInfo.renderPass = m_hRenderPass;
rpBeginInfo.clearValueCount = m_nNumAttachments;
rpBeginInfo.pClearValues = m_vClearColors;

// Add per-view render areas to pNext
VkMultiviewPerViewRenderAreasRenderPassBeginInfoQCOM perViewRenderAreasPnext = {};
VkRect2D perViewRenderAreas[2] = {};

perViewRenderAreas[0].offset.x = 0;
perViewRenderAreas[0].offset.y = 0;
perViewRenderAreas[0].extent.width = 1640 - 200;
perViewRenderAreas[0].extent.height = 1440;

perViewRenderAreas[1].offset.x = 200;
perViewRenderAreas[1].offset.y = 0;
perViewRenderAreas[1].extent.width = 1640 - 200;
perViewRenderAreas[1].extent.height = 1440;

perViewRenderAreasPnext.sType = VK_STRUCTURE_TYPE_MULTIVIEW_PER_VIEW_RENDER_AREAS_RENDER_PASS_BEGIN_INFO_QCOM;
perViewRenderAreasPnext.perViewRenderAreaCount = 2;
perViewRenderAreasPnext. pPerViewRenderAreas = perViewRenderAreas;

// Add to pNext chain
perViewRenderAreasPnext.pNext = rpBeginInfo.pNext;
rpBeginInfo.pNext = &perViewRenderAreasPnext;

Finally, the render area parameter in VkBeginRenderPassInfo still applies, but it must now correspond to the union of all per-view render areas:

rpBeginInfo.renderArea.offset.x = 0; // min(x)
rpBeginInfo.renderArea.offset.y = 0; // min(y)
rpBeginInfo.renderArea.extent.width = 1640; // max(x+width) - min(x)
rpBeginInfo.renderArea.extent.height = 1440; // max(y+height) - min(y)
vkCmdBeginRenderPass(m_hCmdBuf, &rpBeginInfo, VK_SUBPASS_CONTENTS_INLINE);

With that, the implementation will be able to use this information to skip memory operations outside those regions on a per-view basis.

Your turn

The extensions are our contribution to the open Vulkan standard. Using them together can expand the effective framebuffer size in your applications. They offer the benefits of symmetric multiview rendering without GPU workload costs like unnecessary rendering. You’ll take greater advantage of the multiview hardware on the Snapdragon XR2 platform, with potential performance gains.

Optimizing multiview performance in hardware is just one example of how to use these extensions. By controlling viewports, scissors and per-view render areas, you open up many other rendering strategies. We're keen to see how else the developer community uses it, so try it out and let us know what you discover.

Also, in the Vulkan spec and the extension documents themselves you’ll find more information on VK_QCOM_multiview_per_view_viewports and VK_QCOM_multiview_per_view_render_areas.



Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.