Hi,
Recently I have train a network with tensorflow, and convert to dlc, it can work on DSP runtime.
But there is a conv2d layer cost about 200ms, and the filter size is 1x13.
After change the filter size to 1x9, the conv2d layer costs only 4ms.
If the filter size set to 1x10, the conv2d layer costs about 130ms.
Is it a limitation of DSP runtime? And it is not recorded on SNPE docs.