We have an Android service running on the Snapdragon platform where performance and power are of top concern. We are considering using FastCV to optimize portions of our algorithm. How we go about this depends on what FastCV can offer over what we already have or alternatives like OpenCL or RenderScript.
I have dug through the forums, but have been unable to find much detail on performance or the underlying architecture. Ideally we could have ways to estimate cycle counts, but for now I'd just settle just for knowing which processor the specific calls run on. From the list of routines on https://developer.qualcomm.com/docs/fastcv/api/group__image__processing.... roughly what proportion run on the DSP vs the GPU or CPU?
From an earlier post, I gather that FastCV does not provide access to the ISP. If most of these calls run on the CPU, then this may not provide much of an advantage over a NEON implementation. If the image-processing routines run on the Adreno GPU, we're likewise concerned that our service may hurt an application that it is serving due to GPU context-switching.
Your comments are very insightful. We face many design constraints in developing a CV application. In FastCV you can use fcvSetOperationMode to select a mode of operation to alleviate some of the concerns. In general, FASTCV_OP_LOW_POWER mode will direct functions to DSP implementation. FASTCV_OP_PERFORMANCE mode will use the processor whose implementation is the fastest.