I've noticed that convolutional layers with a filter width >= 32 generate completely wrong output when running on the DSP, although they produce correct results on the CPU and GPU. I first suspected quantization to cause the problem but I tried out different filter dimensions, with data lying in a narrow range so that quantization wouldn't be a problem and still see the problem.
Comparing the results between the GPU and the DSP runtime, I witness a sudden difference starting with a filter width of 32. Both runtimes produce consistent results with filter width from 1 to 31 but starting with 32 onward, the DSP runtime produces incorrect results.
I can provide very basic DLC files that exhibit the problem.
I'm using SNPE 1.41.0.2173 on a Snapdragon 820 Automative board. I'm not using a more recent version of the SNPE SDK because as far as I understand, latest versions don't support this board (see threads https://developer.qualcomm.com/forum/qdn-forums/software/qualcomm-neural... and https://developer.qualcomm.com/forum/qdn-forums/software/qualcomm-neural...)
I managed to repro the issue with SNPE latest version (1.49.0.2587 as of today) on a SA8155P Automotive Development Platform board.