Hello everyone,
I'm currently trying to execute some native OpenCl algorithms on an Adreno 330 device. Code is auto generated and use a lot of private variables. Smaller algorithms are fine, but I'm having trouble with bigger ones. I've managed to reduce the code and to get a small (and dumb) example which blocks during execution.
Here is a pseudo-code of what I'm trying to do:
input = 1000;
// OpenCL code
Algo(Input, Output) {
int v1,v2,...,v511,v512,v513,v514;
v511 = 1;
output = input;
}
// End of OpenCl
clFinish();
print output
If in my OpenCl code I'm trying to write in the 511th or the 512th private variable, clFinish() blocks and never returns. If I write in v510 or v513, the execution is fine. If I declare variables as __local, the execution is fine too (but this is not a valid solution for my needs).
Attached is the host.c and algo.cl I use for my tests.
Thank you for your help !
Checked CL_DEVICE_LOCAL_MEM_SIZE? What did it say? Is sizeof(int)*512 smaller?