- I want to runing a bert model on dsp for text-pair matching task, so I need to run batch inputs.
- I use snpe-dlc-quantize to quantize my model(bert), so i have to set batch_size to 1.
- while inferencing, i want to change the batch size using the "input_dimensions" argument
snpe-net-run --container model_encoder.dlc --input_list snpe_inputs/encoder_inputs_10.txt --output_dir snpe_output --input_name hidden_states --input_dimensions 4,1,128,312 --input_name attention_mask --input_dimensions 4,1,128,1
and i got this error:
error_code=107; error_message=Changing of dimensions is not supported by layer. error_code=107; error_message=Changing of dimensions is not supported by layer. MatMul layer name MatMul_20; error_component=System Configuration; line_no=1577; thread_id=139909755639552; error_component=System Configuration; line_no=342; thread_id=139909756180480
the layer is self-attention Q*K calculation:
query : [batch, num_heads,seq_len, head_size], key: [batch, num_heads, head_size, seq_len]
MatMul(query, key)
sdk ver: 1.67.0
onnx version: 1.6.0
is there any way, i can change batch_size while using a quantized model? tks~
Dear developer,
Per my understanding, SNPE do not support multibatch if you quantized model in single batch. Please try to quantize your model on multibatch since SNPE will recognize your multibatch.
i use snpe-dlc-quantize to quantize my model. and failed to quantize model with multibatch.
the toturial says: The tool requires the batch dimension of the DLC input file to be set to 1 during the original model conversion step.
Can i convert an onnx model with batch_size=4 to dlc model, and then quantize the dlc model? how to conduct, thank you.
the data i use for quantize is like, every file is a tensor of batch_size=1:
"
#output1name output2name
embeddings:=snpe_inputs/element_embeddings/0.raw attention_mask:=snpe_inputs/element_attention_mask/0.raw
embeddings:=snpe_inputs/element_embeddings/1.raw attention_mask:=snpe_inputs/element_attention_mask/1.raw
embeddings:=snpe_inputs/element_embeddings/2.raw element_attention_mask:=snpe_inputs/element_attention_mask/2.raw
"