i use tfile to quantize my model, the model precision is good . then i write the same range encodings to the snpe dlc quantized model. but the snpe dsp performance is poor.
I use the snpe debug mode to get the layer dequantized result, and compare to tflite dequantized result. I found that they differ by a scale, as the number of layers increases, the difference gets bigger and bigger.
I am curious about how snpe dsp performs int8 convolution and int32 accumulator shifts round to int8, it seems difference with tflite, i think this is the reason for the big difference between the two