I took pre-trained TF Mobilenet v1, converted to DLC, feed floating point preprocessed data and got correct result on CPU
Then I prepared raw files for calibration with preprocessing assumed to be durin inference, called snpe-dlc-quantize, it produced some min/max/delta/offset statistics to console and I tried to feed the same input that I used for floating point model to quantized one and get quite garbage output.
I am trying to understand who is responsible for the quantization of the input data. I do not see any new layers which would be responsible to this in dlc (looked using snpe-dlc-viewer), I see that input layer starts to have output encoding attribute which relate to quantization but I do not understand if this is reall layer which will quantize input or it is just placeholder.
std::unique_ptr<zdl::DlSystem::ITensor> inputTensor = zdl::SNPE::SNPEFactory::getTensorFactory().createTensor(snpe->getInputDimensions()); zdl::DlSystem::ITensor *t = inputTensor.get(); float *tf = reinterpret_cast<float *>(&(*inputTensor->begin())); // fill pre-processed picture by tf pointer
I have resolved the problem - it was in the quantization with default parameters. I collected outputs from all layers for floating point model and quantized, got that results are quite close after the first convolution but become quite different after the 5th that meant that quantization parameters are bad. After the reading of https://developer.qualcomm.com/docs/snpe/quantized_models.html article, I found couple more parameters for snpe-dlc-quantize. "--optimizations cle bc" improved situation with accuracy, "--bias_bitwidth 32" made accuracy better even more.
In the same time, accuracy drop is still big. On the first 1000 pictures, I have 63% of Top1 accuracy on floating point mode and got 59.1 of quantized model on CPU and 58.1 on DSP.