When I run a ScrubberOpenCL on my android app, it collects the runtime of NDRanges and stuff, but there's no metrics other than that. I mean I already know which kernel was the problem (just by randomely hitting the visual studio debugger pause and looking at which clFinish it was waiting on). Was hoping to get some insight as to how to change the kernel to make it run better - as I am observing a slowdown from Adreno 540 (S8) to 630 (S9). And seeing significantly better perf with same exact kernel code ported to Metal (the code was developed and tweaked in OpenCL Android, so its disturbing that dumb port to Metal runs so much better on iPhoneX) . I see the tool has a "Metrics" dropdown button that is disabled in the ScrubberOpenCL. What's the deal with that.
In any case there's a slowdown in the convolution layers of this DL app from Adreno 540 to 630. Not sure what's going on. And profiler isn't giving enough stats to analyze.