I am new to openCL, I got two blogs of openCL :
I tested as the bolg said, but can not rearch the performance .
In my test, 1024 matrix, it costs about 52ms (VS 23ms in blog).
I tested on qualcomm 835.
my codes are copied from the bolg except the global/local range arguments.
Could anyone share me the complete codes of the blogs ?
my mail is email@example.com
Thanks very much!