Hi,
I was running the FFT2D.cl on Android 5.0.2 + Snapdragon805 with the Adrenoprofiler OpenCL scrubber on for some profiling. I noticed the time reported on the app is very different than the time returned by the scrubber. On my system the timer function clock() in the framework FrmUtils_Platform.cpp seems to only measure the CPU time used by the process. So the timer pauses on any blocking calls such as clFinish() until the thread wakes back up, making the timer returned value to be much less than what the scrubber reports.
More importantly I have some performance questions about the memory transfer speed of global memory objects between kernels for my test runs. Here are some results extracted from the scrubber on the FFT2D sample:
Execution Duration(us) Description %ALU utilization % Global Read L2 Hit # Bytes Read
2151 FFT2DRadix 14.68% 87.50% 8388608
1160 FFT2DRadix4 20.08% 74.83% 8445248
1495 FFT2DRadix4 20.31% 74.67% 8499008
...
2217 MatrixTranspose 03.98% 75.00% 8447040
So on and so forth.
Do these numbers look correct? The %ALU utililzation looks fairly low and the kernels seem data transfer bounded. Have I missed some configuration setting that could have negatively affected the datatransfer speed? The system I am running on is an Inforce 6501 development kit (with apq8084 Snapdragon 805) running a version of the Android 5.0.2.
Thanks.