OpenCL Optimization: Stop Leaving Compute Cycles on the Table