Hello,
I am encountering issues while running a Pytorch Model, using SNPE SDK on the S24+ Samsung Device on the HTP core. I can successfully run the model on the CPU. But the quantized DLC formation or the snpe-net-run command fails with an error while running on the HTP, under specific arguments given to snpe-dlc-quantize. (Note: If I create a simple quantized dlc with all default arguments it works fine on HTP)
Running Device Configuration:
Samsung Galaxy (S24+)
Chip: SD 8 Gen 3, DSP Arch: V75 (AIP not supported) [HTP_V75_SM8650_8MB]
Please find the attachment for more detailed device information
SNPE VERSION: 2.22.0.240425
Steps Followed:
- Converted model from Pytorch-> ONXX -> DLC
- Loaded the right libraries/binaries/model/raw data on S24+ (Followed: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-2/tutorial_inceptionv3.html)
Issues Seen:
I did not see any of these errors while running on the CPU. But see for HTP backend.
Command Issue is seen for:
snpe-dlc-quantize
- WEIGHTS_BITWIDTH=16
snpe-dlc-quantize --input_dlc model.dlc --input_list data/target_raw_list_abs.txt --output_dlc ./output_dlc/model_16.dlc --weights_bitwidth 16 --enable_htp (Works Fine)
snpe-net-run --container model_16.dlc --input_list target_raw_list.txt --use_dsp (Fails with ERROR)
ERROR: error_code=402; error_message=Network partition has failed. Fallback runtime needed for this offline cache record 0; error_component=Dl Network; line_no=131; thread_id=541088253184
- ACT_BITWIDTH=16
snpe-dlc-quantize --input_dlc model.dlc --input_list data/target_raw_list_abs.txt --output_dlc model_16.dlc --act_bitwidth 16 --enable_htp (Fails with Error)
Error:
Error: [USER_ERROR] initial_sequencer_dp.cc:264:ERROR:A single op, "q::mul_op" (Op ID: 515b400000ee8), requires 0x900000 bytes of TCM, which is greater than the TCM size of 0x800000!
[USER_ERROR] initial_sequencer_dp.cc:271:ERROR:The name of the failing op before optimization is: "q::QNN_ElementWiseBinary" (Op ID: ee8).
[USER_ERROR] QnnDsp <E> "/decoder_level1/decoder_level1.0/norm1/body/Mul_1" generated: Requires 0x900000 bytes of TCM, which is greater than the TCM size of 0x800000!
[USER_ERROR] QnnDsp <E> RouterX86 graph prepare failed 13
[USER_ERROR] QnnDsp <E> Failed to finalize graph (id: 1) with err 1002
[USER_ERROR] error code = 401; QnnGraph_finalize failed: 1002
[USER_INFO] Backend Mgr ~Dtor called for backend HTP
[USER_INFO] Cleaning up Context handle=0x1 for Graph Id=0 backend=HTP SNPE Id=0x58e22f5b2df8
error_code=401; error_message=Network creation has failed. QnnGraph_finalize failed: 1002; error_component=Dl Network; line_no=2377; thread_id=138961664092096
[USER_INFO] Done Cleaning up Context handle=0x1 for Graph Id=0 backend=HTP SNPE Id=0x58e22f5b2df8
[ERROR] SNPE HTP Offline Prepare: Could not generate cache record for subnet 0: {0, 1428}
[ERROR] SNPE HTP Offline Prepare: Failed to generate cache for SM8650
[INFO] ======== Run Summary ========
[INFO] SM8650 : Failed
[USER_INFO] BackendTerminate triggered
[INFO] DebugLog shutting down.
snpe-net-run --container model_16.dlc --input_list target_raw_list.txt --use_dsp (Fails with Error)
Error_code=401; error_message=Network creation has failed. QnnGraph_finalize failed: 1002; error_component=Dl Network; line_no=131; thread_id=533471405312