According to the "Snapdragon_LLVM_ARM_37_Compiler_User_Guide", the compiler flags to be set to get best performance are simply:
-Ofast -mcpu=krait or -mcpu=cortex-a57
However if i build with just such flags, i get the warning:
clangSnapdragon_LLVM_for_Android_37++.exe: warning: Vectorization flags ignored because armv7/armv8 and neon not set [-Wvectorizer-no-neon]
If i set -mfpu=neon of course the warning goes away, but what is the best parameter for -mfpu for Krait and Snapdragon 810 cores?
Incidentally, corrently i get about 10% better performance both on krait as well as cortex-a57 by using GCC 4.9 instead of LLVM for Snapdragon.
For all 32-bit compilation where you would like Auto vectorization, you need to specify -mfpu=neon since this flag is required to generate SIMD instructions in 32-bit mode. For 64-bit mode, there is no need to specify -mfpu.
If you can share with us your kernel loop, we can help identify the performance degradation you are seeing with Snapdragon LLVM
Thanks
Thanks, unfortunately i cannot do that as there isn't a single kernel loop, this is a complex computer vision project and there are a few dozen performance critical algorithms.
I haven't done any profiling but i suspect a bottleneck could be instruction decoding or instruction cache misses, i get better performance in mthumb mode (interestingly the best performance with GCC is achieved optimizing for cortex-a53).