Forums - How to run nms function with gpu to speed up time on RB5 dev kit. After inference using model_quant.dcl file

2 posts / 0 new

or Register

Last post

How to run nms function with gpu to speed up time on RB5 dev kit. After inference using model_quant.dcl file

nguy3nt4n99

Join Date: 14 Mar 24

Posts: 2

Posted: Sat, 2024-04-27 22:45

Top

Hi everybody,

I was able to run yolov8n.dcl based on the code at https://github.com/quic/sample-apps-for-robotics-platforms/tree/master/R...

However, after completing the inference, I have to run the nms function using the CPU, which takes a lot of time. Is there a way to run this nms function using GPU?

I translated the original nms function to cpp code with the torch-cpu library

yolov8's original nms function at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/utils/o...

Thanks,
Tan

Forum vote up/down

Re: How to run nms function with gpu to speed up time on RB5... #1

jesustotten735

Join Date: 27 Jun 24

Posts: 1

Posted: Thu, 2024-06-27 21:46

Top

Quote:
Hi everybody,
I was able to run yolov8n.dcl based on the code at https://github.com/quic/sample-apps-for-robotics-platforms/tree/master/R...merge fruit
However, after completing the inference, I have to run the nms function using the CPU, which takes a lot of time. Is there a way to run this nms function using GPU?
I translated the original nms function to cpp code with the torch-cpu library
yolov8's original nms function at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/utils/o...
Thanks,
Tan

I think you should modify your code to use the GPU version of the NMS function. You can utilize the torchvision.ops.nms function, which provides an optimized GPU implementation of NMS. Import the necessary packages:

python

import torch

from torchvision.ops import nms

Then, replace the relevant part of your code where you perform NMS with the following lines:

python

# Assuming `boxes` and `scores` are the bounding boxes and corresponding scores, respectively

keep = nms(boxes, scores, iou_threshold)

boxes = boxes[keep]

scores = scores[keep]

Note: Make sure that the boxes and scores tensors are already on the GPU. You can move them to the GPU using the .to(device) method, where device is the CUDA device you want to use.

or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Sort By

Filter Results