snpe-pytorch-to-dlc inserts additional permute level before each output node. This is done to convert from DLC NHWC layout to PyTorch NCHW layout. If I actually want to get the output as NHWC - there is no way to do this. It is a real issue because the NCHW tensors are not aplicable for GPU runtime:
snpe-pytorch-to-dlc needs --output_layout flag; unusable on GPU otherwise
Posted: Mon, 2022-08-08 02:03
/home/gershon/git/snapdragon-poc/tools/snpe/lib/python/qti/aisw/converters/backend/ir_to_dlc.py:1049: RuntimeWarning: info_code=802; message=Layer parameter value is invalid in GPU. Layer permute_1 : output width = 266, depth = 476 width * depth (packed) = 31654 exceeds maximum image width 16384 for Adreno A650; component=GPU Runtime; line_no=1095; thread_id=140190701469888
In other words as long as this bug is not fixed we cannot run models produced by snpe-pytorch-to-dlc on GPU.
The same problem exists for the input nodes; however the snpe-pytorch-to-dlc script has --input_layout argument that an be used to tell the script that an input is already NHWC and should not be permuted.
Similar argument is badly needed for eliminating the output permutes. It would be natural to call this argument --output_layout. The script should drop (not insert) the permute layer before an output if --output_layout <output-name> NHWC is set.
Hi,
In SNPE, the image must be presented in a tensor shape (NHWC), where channel is the fastest-changing dimension. (NOTE:This is the default arrangement for SNPE).
If a tensor layout of NCHW is selected, then the data and/or tensor parameters may need to be reshaped to SNPE default.
Thanks.