Hello everyone,
I'm currently trying to convert my onnx model (after quantized and exported with AIMET) to .dlc format through the usage of SNPE tool using snpe-onnx-to-dlc command.
My model contains mainly convolution 2D, relu and convolution transpose operations (https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html) with the usage of output padding. During the conversion, I'm having a shape mismatch error:
```
Traceback (most recent call last):
File "/snpe/snpe-1.47.0.2501/lib/python/qti/aisw/converters/common/utils/translation_utils.py", line 125, in get_broadcasted_shape
output_shape = list(np.broadcast(*inputs).shape)
ValueError: shape mismatch: objects cannot be broadcast to a single shape
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/snpe/snpe-1.47.0.2501/lib/python/qti/aisw/converters/onnx/onnx_to_ir.py", line 157, in convert
self.graph)
File "/snpe/snpe-1.47.0.2501/lib/python/qti/aisw/converters/common/converter_ir/translation.py", line 51, in apply_method_to_op
return translation.apply_method(method_name, *args, **kwargs)
File "/snpe/snpe-1.47.0.2501/lib/python/qti/aisw/converters/common/converter_ir/translation.py", line 17, in apply_method
return self.indexed_methods[method_name](*args, **kwargs)
File "/snpe/snpe-1.47.0.2501/lib/python/qti/aisw/converters/onnx/onnx_translations.py", line 200, in add_op
broadcast_shape = translation_utils.get_broadcasted_shape(input_shapes)
File "/snpe/snpe-1.47.0.2501/lib/python/qti/aisw/converters/common/utils/translation_utils.py", line 127, in get_broadcasted_shape
raise ValueError("Shape mismatch, {} cannot be broadcast to a single shape".format(input_shapes))
```
After some investigation, it seems like the shape mismatch is caused by the function which calculate the output dim of the convolution transpose 2d. The function itself does not take into consideration the output padding hence the output dimension was calculated incorrectly. I'm wondering if anyone has come across this issue and if there's a work around for it (my solution is to switch to using bilinear upsampling completely, but I just want to make sure that I've tried everything).
Note: this is the output dim calculation for deconv operation using snpe toolkit
```
def calc_deconv_output_dim(input_size, filter_size, pad_before, pad_after, stride, padding_size_strategy):
if padding_size_strategy == IRPaddingStrategies.PADDING_SIZE_IMPLICIT_VALID:
output_dim = input_size * stride + max(filter_size - stride, 0)
elif padding_size_strategy == IRPaddingStrategies.PADDING_SIZE_IMPLICIT_SAME_BEGIN \
or padding_size_strategy == IRPaddingStrategies.PADDING_SIZE_IMPLICIT_SAME_END:
output_dim = input_size * stride
else: # EXPLICIT, EXPLICIT_FLOOR or UNDEFINED
output_dim = stride * (input_size - 1) - (pad_before + pad_after) + filter_size
return int(output_dim)
```