Hi!
I'm working on a console emulator renderer that relies on framebuffer fetch to perform custom pixel operations. On other mobile devices (Mali, PowerVR), using framebuffer fetch with inputAttachments & subpassLoad gives the proper result as they guarantee pixel ordering within a same draw call that has primitives with overlapping pixels.
Doing the same on Vulkan with Adreno GPUs doesn't give the same result (ordering is not guaranteed). Adreno supports GL_EXT_framebuffer_fetch which is supposed to guarantee ordering, so, I guess the GPU supports it but it's not the default behavior on Vulkan? I know Adreno has the QCOM_shader_framebuffer_fetch_noncoherent extension and maybe that's the default behavior when using the Vulkan API on Adreno?
I can provide more info about the render pass setup if needed.
Thank you very much!
Although Adreno GPUs do have HW support for both coherent and non-coherent framebuffer fetch, the coherent mode comes at significant performance penalty for some use-cases. That's the primary reason that QCOM_shader_framebuffer_fetch_noncoherent exists -- it gives developers a way to opt-in to the better power/performance if they dont need the guaranteed ordering.
The Vulkan spec does not require guaranteed ordering for this case (subpass self dependency) and therefore the Adreno driver configures the GPU for the non-coherent mode by default in Vulkan. It is likely that a Vulkan extension will be published soon that would offer the coherent behavior for Vulkan. I cant commit when we'd support such extension, but would like to know more about you needs so we can prioritize.
I assume for your use case, judicious use of pipeline barriers in the subpass is not a viable option? Is there anything else about your use-case that you would want to share?
Hi!
Ok, so that confirms my theory :) Glad to hear you're working on an extension to support that use case!
The emulator tries to batch primitives that share the same material properties as much as it can to reduce the amount of draw calls made to the host system's graphics API. It would be possible to split batches when overlapping geometry is detected and insert a barrier to make sure framebuffer fetch gives proper results on Adreno devices, but it adds an extra burden on the CPU which is already quite busy emulating the guest system and doing other stuff in the render thread. Using coherent framebuffer fetch would let me move some of that burden on the GPU, which is not super busy since we're rendering relatively simple stuff.
In any case, I just happy to hear that an extension enabling coherent framebuffer fetch on Vulkan is on your roadmap. I'll probably have to implement the overlapping primitive detection algo to support all current Adreno devices anyways, so, it's not an emergency. It'll be just nice to play with it to see the performance impact when it's available.
Here's the link to the project in case you are interested: https://github.com/jpd002/Play-
Thank you very much for your reply!