Hi there. I try to run yolov8m by Snpe2.16 on Xiaomi 14 sm8650(8gen3) but failed. You can get my model on https://github.com/lsCoding666/SNPE-YOLOV8-MODEL/blob/main/model/yolov8m_htp.dlc
Code and screenshot on https://github.com/lsCoding666/SNPE-YOLOV8-MODEL/tree/main
ONNX Version 1.15
SNPE version 2.16.0
The model is quantified.
The problem is the quantified model run on Gpu, It works. But run on Dsp, It failed.
Run on DSP: mention line 28
https://github.com/lsCoding666/SNPE-YOLOV8-MODEL/blob/main/run_on_dsp.png
Run on GPU
https://github.com/lsCoding666/SNPE-YOLOV8-MODEL/blob/main/run_on_gpu.png
And Here is the code
https://github.com/lsCoding666/SNPE-YOLOV8-MODEL/blob/main/Person_Detect...
```
cv::Mat Person_Detect::ProcessImgYoloV8(cv::Mat mat, char *pJstring) {
img_mat = mat;
//resize
std::vector<Detection> output;
cv::Mat res_img = cv::Mat(640, 640, CV_8UC3);
cv::Mat input_mat;
im_scale = std::min((float) INPUT_WIDTH / img_mat.cols, (float) INPUT_HEIGHT / img_mat.rows);
int new_w = int(img_mat.cols * im_scale);
int new_h = int(img_mat.rows * im_scale);
cv::resize(img_mat, input_mat, cv::Size(new_w, new_h)); //resize
int p_w = INPUT_WIDTH - new_w;
int p_h = INPUT_WIDTH - new_h;
int top = p_h / 2;
int bottom = p_h - top;
int left = p_w / 2;
int right = p_w - left;
cv::copyMakeBorder(input_mat, input_mat,
top, bottom,
left, right,
cv::BORDER_CONSTANT,
cv::Scalar(114, 114, 114));
//start predict
zdl::DlSystem::TensorMap output_tensor_map = qc->predict(input_mat);
zdl::DlSystem::StringList out_tensors = output_tensor_map.getTensorNames();
//put result into out_itensor_map. This is easy to debug
out_tensors = output_tensor_map.getTensorNames();
std::map<std::string, std::vector<float>> out_itensor_map;
for (size_t i = 0; i < out_tensors.size(); i++) {
zdl::DlSystem::ITensor *out_itensor = output_tensor_map.getTensor(out_tensors.at(i));
std::vector<float> out_vec{reinterpret_cast<float *>(&(*out_itensor->begin())),
reinterpret_cast<float *>(&(*out_itensor->end()))};
out_itensor_map.insert(std::make_pair(std::string(out_tensors.at(i)), out_vec));
}
//put the result into decode_infer
std::vector<BoxInfo> result;
zdl::DlSystem::ITensor *out_itensor = output_tensor_map.getTensor(out_tensors.at(0));
auto boxes = Person_Detect::decode_inferV8(out_itensor->begin().dataPointer(),
{(int) img_mat.cols, (int) img_mat.rows},
left, top,
class_list.size(),
CONFIDENCE_THRESHOLD);
result.insert(result.begin(), boxes.begin(), boxes.end());
//nms
Person_Detect::nms(result, NMS_THRESHOLD);
//draw text and rectangle
for (int i = 0; i < result.size(); ++i) {
auto detection = result[i];
__android_log_print(ANDROID_LOG_INFO, LOG_TAG, "tag %d", detection.label);
__android_log_print(ANDROID_LOG_INFO, LOG_TAG, "tag %f", detection.score);
cv::Scalar color = cv::Scalar(255, 255, 0);
cv::rectangle(img_mat, cv::Point(detection.x1, detection.y1),
cv::Point(detection.x2, detection.y2),
color,2);
cv::rectangle(img_mat, cv::Point(detection.x1, detection.y1 - 20), cv::Point(detection.x2, detection.y1 ),
color,-1);
std::stringstream ss;
ss << class_list[detection.label] << detection.score;
cv::putText(img_mat, ss.str(), cv::Point(detection.x1, detection.y1),
cv::FONT_HERSHEY_COMPLEX, 0.8,
cv::Scalar(0, 0, 0), 2);
}
std::string str1 = "/storage/emulated/0/testresult/";
std::string str2 = ".jpg";
cvtColor(img_mat, img_mat, CV_RGB2BGR);
cv::imwrite(str1.append(pJstring).append(str2), img_mat);
pred_out.clear();
return img_mat;
}
```
Here is decode_inferV8:
```
std::vector<BoxInfo>
Person_Detect::decode_inferV8(float *dataSource, const YoloSize &frame_size,
int left, int top,
int num_classes, float threshold) {
float *data = dataSource;
std::vector<BoxInfo> result;
std::vector<float> confidences;
std::vector<cv::Rect> boxes;
for (int i = 0; i < 8400; ++i) {
std::vector<int> class_ids;
float maxScore = 0;
int maxClass = -1;
for (int cls = 0; cls < num_classes; cls++) {
float score =
data[cls + 4];
if (score > maxScore) {
maxScore = score;
maxClass = cls;
}
}
if (i == 7255){
int a = 0;
}
if (maxScore > threshold) {
confidences.push_back(maxScore);
class_ids.push_back(maxClass);
BoxInfo box;
float w = data[2];
float h = data[3];
box.x1 = std::max(0, std::min(frame_size.width,
int((data[0] - w / 2.f - left) / im_scale)));
box.y1 = std::max(0, std::min(frame_size.height,
int((data[1] - h / 2.f - top) / im_scale)));
box.x2 = std::max(0, std::min(frame_size.width,
int((data[0] + w / 2.f - left) / im_scale)));
box.y2 = std::max(0, std::min(frame_size.height,
int((data[1] + h / 2.f - top) / im_scale)));
box.score = maxScore;
box.label = maxClass;
result.push_back(box);
}
data += 84;
}
return result;
}
```
Decode_inferV8 and ProcessImgYoloV8 function is run well on yolov8 by gpu. So decode_inferV8 and ProcessImgYoloV8 should be no bugs.
I also try to run yolov5 on Xiaomi14. It works well both gpu and dsp.
The model is same but the result is different. It is so stange so I wish to get your help. Thanks
By the way, I add gpu and dsp float raw reult to https://github.com/lsCoding666/SNPE-YOLOV8-MODEL/
and you can compare these files, and you can see the position of the person is very close to, but the dsp score is all 0.
https://github.com/lsCoding666/SNPE-YOLOV8-MODEL/blob/main/dsp_gpu_resul...
bug fixed.I found the problem is the last step of the model : concat.
Concat the location and score into reult has bug, make scroe all 0
the soultion is delete contcat and the model has 2 output, one is output position and the other is score
bug fixed.I found the problem is the last step of the model : concat.
Concat the location and score into reult has bug, make scroe all 0
the soultion is delete contcat and the model has 2 output, one is output position and the other is score