Forums - Strange Behavior With OpenCL on Adreno 320

3 posts / 0 new

or Register

Last post

Strange Behavior With OpenCL on Adreno 320

rhameed

Join Date: 27 May 14

Posts: 4

Posted: Tue, 2014-07-08 17:55

Top

I am currently porting a video processing algorithm to Adreno 320 / OpenCL (Nexus 7) which we had previously implemented using DirectX Compute Shaders. Right now I am battling an issue which has crippled my progress. On executing some of my opencl kernels I get an error code -54 which indicates invalid work group size. The problem however is that the work group size in those instances is actually perfectly fine (16x8=128) and is same as other kernels which work fine. The error is in fact dependent on the kernel code. If I comment most of the kernel code the error goes away. As I start adding back funtionality at some point it errors out saying invalid work group size.

My initial hunch was that when there are too many memory operations in flight by the threads then this error comes up. In one case I was able to make this error go away by inserting barriers in the code to ensure there ate not too many outstanding memory transactions in flight by the threads at any given time. In other cases even that didn't work. And in some cases even a fairly simple kernel introduces this error. I am really stuck on this as I can't find any deterministic way to avoid this.

Are there any ideas / suggestions about this?

Forum vote up/down

Re: Strange Behavior With OpenCL on Adreno 320 #1

rhameed

Join Date: 27 May 14

Posts: 4

Posted: Thu, 2014-07-10 11:01

Top

A followup to my previous post:

After playing around more, I found out that depending on the contents of the kernel the maximum supported group size seems to go down. For one of my kernels (which is using local shared memory as well as atomic operations) it runs fine as long as I keep the group size at 32 or smaller. It could be 32x1, 8x4, 16x2, 4x8, but as long as it is within 32 it works. For another kernel which is using multiple texture reads and shared local memort, the limit seems to be 64. And for my simplerst kernels which just read a single tetxure element and write back to another texture the limit seems to be 128. As per the device information calls in OpenCL, the device supports a group size of up to 256 but I always get an error if I try to go above 128 even if I have basically a simple pass-through kernel.

So in summary it seems that based on kernel contents the driver seems to enforce diffrent work group size limits. Not sure why is that and if it will be fixed, but for now at least I can move forward though with a performance hit.

Re: Strange Behavior With OpenCL on Adreno 320 #2

Evgeniy P

Join Date: 11 Mar 14

Posts: 1

Posted: Sat, 2014-07-12 03:22

Top

Hello, rhameed.

GPUs as compute devices have a lot of limitations, especially embedded ones. As an example number of GPU registers that are used for private variables of kernel. If there are not enough registers to fulfill kernel needs and reside all live values into them two common solutions are applied. Desktop GPUs usually use register spilling when long-latency global memory (DDR) is used instead of GPU registers to store and operate with variables (big performance drops). Another way that is more common for embedded devices is to simply shrink maximum work-group size. There are a lot of other restrictions in mobile GPUs that could impact on maximum work-group size such as limit on kernel/shader binary size. But I'm not certainly sure how they are applied to Adreno GPUs. It seems that low number of registers (compared to desktop graphics of course) is what you are suffering from.

In any case according to OpenCL specification you can query information that is related to certain kernel object and executing device using clGetKernelInfo and clGetKernelWorkGroupInfo calls. In your case you need CL_KERNEL_WORK_GROUP_SIZE parameter that can be retrieved by clGetKernelWorkGroupInfo.

or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.

Sort By

Filter Results