Hi guys,
I have a Unet model I want to benchmark on CPU, GPU and DSP. I'm using Open-Q 820 board.
Unet output is 128x128 segmentation map with two classes, 0 and 1.
I'm using SNPE 1.14.0 and 1.19.2, and Tensorflow 1.10.0.
I validate SNPE outputs against Tensorflow outputs using Intersection over union (IoU) metric.
CPU and GPU give good results (IoU ≈ 1), for both SNPE 1.14.0 and 1.19.2.
DSP with SNPE 1.14.0 gives good results (IoU ≈ 0.99) but SNPE 1.19.2 gives very poor results (IoU ≈ 0.05).
I tried quantizing both manually and letting SNPE quantize by itself, it makes no difference.
I also checked quantization paramenters with both SNPE versions and made sure that they are identical, which means that
SNPE runtime is making the difference.
I took a closer look at the outputs of each Unet layer and both SNPE versions give exact same results up until the first conv2d_transpose layer (deconvolution layer in SNPE terms) and then this error propagates on deeper into the Unet.
I can provide a .dlc model of my deconvolution layer for both SNPE versions.
I was really lucky that I had SNPE 1.14.0 from before because it cannot be downloaded anymore, but my Unet is about 10-15% faster when using SNPE 1.19.2, so want to use 1.19.2 if possible.
Regards,
Nikola.
P.S.
I also tried SNPE 1.18.0 and it gives poor performance.
I met the same problem using snpe-1.29.0.
CPU GPU both gave correct answer when validating the result values layer by layer, while DSP went wrong at the deconvolution layer.
Any solution or workaround?