Hi,
I got uncomprehended results when I executed benchmarking python scripts that is provided from SNPE v1.10.1.
I used converted dlc of resnet-152 and tried to comprare benchmark result value between GPU and DSP on Snapdragon 835.
The results shows that convolutional layers have more time than ReLU layers (almost zero) in GPU.
However, ReLU layers have more time than convolutional layers (almost zero) in DSP.
I got feeling strange and did not find way to get clue of it.
Thus, I would like to ask that I got right benchmarking results whether or not.
Additionally, I also would like to know correct benchmarking method if I got wrong.
Thanks in advance.
The DSP merges layers together during initialization time, so that it can run more efficiently. One easy merge to do is to merge a convolution followed by a relu layer, into a single layer. An artifact of this is that the Conv + Relu time is mainly accounted for in the Relu layer time. It's just the way things are implemented. Rest assured that most of the time you are seeing is due to the Conv layer.
The GPU does the same thing, but the way it's implemented, the time for Conv + Relu shows up in the Conv layer, and Relu will show zero.
There is mention runtime layer optimizations by merging, in the user's guide, in the benchmarking chapter.