Hi ,
I am new in mobile GPU OpenCL development, but I have big expirience with desctop one.
I have read OpenCL programing guide and Shaders_Best_Practices guide. In that documents I have read that threads on Adreno GPU can divergence, but I can't see any information about size of wavefront or warp on Adreno devices.
On desctop GPU AMD have 64 threads wavefront size, and Nvidia GPU have 32. This information is very important for choosing best workgroup size, and making code optimization.
Can someone provide such information.
Thanks, Pavel.
As there is no answer.
I tried various work group sises in copy buffer kernel. And for my mind it looks like 64 is wavefront size.
Soon I will try kernels with branches and renew this post.
I also found that the most effective data size per workitem is uint4, it works 3 times faster than uint or char4.