I'm toying with using OpenCL on the Snapdragon platform and have been looking at the performance benefit on my Adreno 320 platform (Snapdragon 600AB). In a classic matrix multiply, I can only get about 0.5Gflops out of the GPU having varied group and tile sizes whereas a single CPU core gets me 1.3Gflops with the same code.
I realize that Gflops aren't the best measure and code often needs to be tuned to the platform, etc. but I'm wondering what kind of performance I should expect. I was expecting a few 10s of Gflops, but perhaps up to 1Gflop is more correct if not disappointing. Does anyone have a datapoint on this platform for performance that would either confirm or deny my investigatory results?
I was using the Intel GEMM sample for my experiments: https://software.intel.com/en-us/articles/gemm
Data from OpenCL on other Adreno GPUs would also be interesting. I'm trying to get a data point on what to expect.
Anybody? Seems like OpenCL development on the Adreno GPU is pretty much non-existent compared to NVidia, AMD, etc.