Forums - UDL code running on GPU/DSP

8 posts / 0 new
Last post
UDL code running on GPU/DSP
zwave
Join Date: 1 Aug 17
Posts: 2
Posted: Thu, 2017-10-19 12:26

Does SNPE APIs support running User defined Layer (UDL) code on GPU/DSPs? Is it possible to run OpenCL code for UDL (instead of code running on CPU)? If this is currently not supported, do you have plans to support this feature in future releases?

 

  • Up0
  • Down0
bspinar
Join Date: 4 Feb 15
Posts: 21
Posted: Wed, 2017-11-01 16:08

Yes. The UDL code is not limited to the CPU. You can write your UDL in OpenCL and run it on the GPU if you want. In principle you can also run it on the DSP, but this may be difficult, if not prohibited, depending on your device as production devices require that the DSP code be signed to execute it on the DSP. 

  • Up0
  • Down0
lubomir
Join Date: 26 Jul 17
Posts: 1
Posted: Thu, 2017-11-02 19:27

Thank you for your reply. What we are hoping to do is run UDL layers on the GPU without  performance penalty.

Is the GPU and CPU memory shared? If not, it seems UDL layers have a large performance penalty as the memory needs to be copied to the CPU and back, correct? Any way to avoid this?

Lubomir
 

  • Up0
  • Down0
Rex
Join Date: 8 Aug 15
Posts: 45
Posted: Thu, 2017-11-02 19:57

"Is the GPU and CPU memory shared? If not, it seems UDL layers have a large performance penalty as the memory needs to be copied to the CPU and back, correct? Any way to avoid this?"

MSM devices are almost all unified memory. The GPU requires specific types of this unified memory (ION memory). So if you allocate your buffers correctly, SNPE should work fine on GPU and the memory can be used by your GPU shader.

Rex

  • Up0
  • Down0
bspinar
Join Date: 4 Feb 15
Posts: 21
Posted: Fri, 2017-11-03 09:14

I understand the request, and (as Rex points out), there are ways to avoid copies between the SNPE GPU runtime and the UDL "runtime" (if the UDL is coded for the GPU) such as ION buffers. However, unfortunately at this point, SNPE will always copy the data to/from the GPU runtime to/from the UDL callback. FWIW, the CPU runtime does not do this copy.

The UDL feature was originally intended as as prototyping/experimental feature as opposed to a highly performant "production" way to run layers that SNPE does not support. Your request for a more optimal way to use the SNPE GPU runtime with GPU UDLs has been echoed by other users (including even requests to share the GPU context between the two). In light of these requests, we are considering ways to optimize this in future versions of SNPE, but it's our policy that we won't provide roadmap or future feature timeline predictions on the forum so I can't say if/when these kind of optimizations will show up.

Thanks again for the questions.

  • Up0
  • Down0
Rex
Join Date: 8 Aug 15
Posts: 45
Posted: Fri, 2017-11-03 10:07

"SNPE will always copy the data..."

Does this mean the function SNPEBuilder& setUseUserSuppliedBuffers(bool bufferMode) does nothing in terms of memory usage to/from the GPU context? As you state memory is always copied when using the GPU.

Thanks.

Rex

  • Up0
  • Down0
bspinar
Join Date: 4 Feb 15
Posts: 21
Posted: Tue, 2017-11-07 09:15

My "always copy" comment was intended only for UDL layers. In terms of the user supplied buffers, how much copying happens depends. On the GPU, before the user buffers, there were often two copies that might happen for the GPU runtime (one done by the user to get data into the input tensor and one to move data to a buffer usable by the GPU). The same would be true in reverse on the output. With the user buffers, the copy in/out of the tensor is removed but the GPU copy is still there. Our goal is to remove the "second" copy as well, but the hooks aren't in place yet in the commercial code to eliminate it (e.g. allowing the user to pass in an ion buffer or other mechanisms to make it go away). I can't comment on the timeline for such a feature, but it is certainly a useful one to add that we are looking into.

  • Up0
  • Down0
gesqdn-forum
Join Date: 4 Nov 18
Posts: 184
Posted: Tue, 2019-06-11 04:29

Hi,
kindly follow the instructions from SNPE Documents  to create a User Defined Layer (UDL).
To build the CPP application with UDL layer use " Building and Running on x86 Linux and Embedded Linux" section from this link.
The executable will be in <snpe_sdk>/examples/NativeCpp/UdlExample/obj/local/x86_64-linux-clang.

The below command will help you how to use different runtimes for executing the application,
 

$ ./snpe-net-run-udl --help

  • Up0
  • Down0
or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.