We have implemented the same algorithm using Vulkan and OpenCL with virtually the same code in their respecitve kernels and shaders. We have have profiled on two devices: Samsung S10 with SD855 and Samsung S8 with SD835 both running Android 9 with the latest September Update. The results are really odd; the Vulkan implementation on S10 runs much slower than the OpenCL one and even slightly slower than the Vulkan implementation on S8. We are using float32 for both implementations. See below for details ( Normalized to the perfomance of S10/Vulkan)
S10 - Vulkan:1.0x, OpenCL: 2.65x
S8 - Vulkan:1.23x, OpenCL: 1.48x
The snapdragon profiler shows that the GPU clock isn't maxed out when runnning Vulkan on S10. It hovers around 280 MHz while it shoots up to 600+ when running OpenCL on S10 as well as Vulkan and OpenCL on S8. Keep in mind that we are using the same APK for each profiling run. You can see screenshots of profiler in the link below: Orange -> OpenCL on S10, Blue -> Vulkan on S10 and Red -> Vulkan on S8.
Any insights in this behavior would be really appreciated.