hello dear sir madam,
I have question about OpenCL queue optimization. We are making some processing using OpenCL, In this processing many kernels are executed serially in one queue. All kernels are rather light (each of then takes for about 100 – 300 mks) but sequence of all kernels takes much more time than sum of their execution time measured using OpenCL events. We took Adreno profiler, and found that there are holes between kernels executions, holes are rather large ~ 300 – 1000 mks.
Are such holes are normal for Adreno OpenCL queue, or we making something wrong?
And how we can reduce holes time?