Forums - Optimization bottleneck

1 post / 0 new
Optimization bottleneck
Fans0014
Join Date: 10 May 17
Posts: 8
Posted: Tue, 2018-04-24 01:18

In my neural network, there are so many layers to call clEnqueueNDRangeKernel  and the GPU compution task was not heavy that the the time consume of it on host side was much more than the kernel execution time.

Suer, it's obvious that we should focus on simplifying our network. But I want to know is there some tricks to reduce the API calls on the device side or reduce the time consum of every API call. In my project, the whole program execution time is about 60ms, however the GPU kernel execution time is less than 10ms.

Here is the profiler gragh:

https://drive.google.com/file/d/1ex0v4ut981i6lrNM22GHBrXnaAl_wQmV/view?u...

 

 

  • Up0
  • Down0

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.