We have 2 principal questions about FastRPC-based applications (as I understand, no another way to communicate with DSP using FastCV...?):
1) Is the way to hack lack of "inout" IDL parameter? We need bi-direct shared memory buffer... Little tests shows that is possible to do:
mystruct_t mystruct = { ... };
mydsp_func(&mystruct, &mystruct);
where: mydsp_func(in mystruct_t in_mystruct, rout mystruct_t out_mystruct) in IDL
But in this hack we have really two different copies of mystruct and mydsp_func() implementation gets 2 different pointers.
2) Is the way to workaround FastRPC synchronous nature (to prevent scheduller implementation on client-side with may-many-many calling of some DSP-interface function via FastRPC mechanism, may be via some non-blocked call and async. response from DSP-side) ?
in your implementation do the follwoing
int mydsp_func(const mystruct_t* sin, mystruct_t* sout) {
if(sout != sin) {
memmove(sout, sin, sizeof(*sout));
}
do_something(sout);
}
Parameters are split into in/rout because the communication channel to the aDSP can overlap flushing/invalidating cache for input and output parameters between arm and aDSP. So an inrout structure would essentially have to be split into two, one to synchronize all the inputs, and one to invalidate all the outputs.
If the structure is simple, and the memory comes from a buffer sharable to the aDSP (like ION allocator on android), then the pointers will be equal and memmove will not be necessary.
IDL doesn't support this natively is because for complex structure it would have to do a traversal of all the members of the structure and do a test and copy. This is inefecient, and having this feature in IDL would only hide the complexity and ineffeciency of the operation.
Both buffers are in ION and are the same pointer really (sin and sout). But:
1) they are not equals!
2) their content is not the same!
And yes, I do memcpy(..) from one to another, but this seems to be bad trick for performance :(
One more thing. Mapping overlapping addresses into the same virtual address space on the aDSP only works for sequence buffers, not simple parameters. Simple parameters, whose size is known at compile time, get coalesced onto a single input and output buffer.
So
long myfunc(in long a, in long b, in sequence<char> buf, rout long c)
or in C
int myfunc(int a, int b, char* buf, int bufLen, int* c)
is packed into 3 buffers,
1) All the statically known inputs: a, b, bufLen
2) the runtime input buffer: buf
3) the statically known outputs : c
This ends up being consdierably faster then mapping each parameter directly.
1) For stubs and skels copying staticly known sized objects is really fast, compilers can inline the code to do so without calling out to memmove
2) the kernel driver can work over a smaller set of pages, which reduces the amount of work it has to do, and reduces the number of mappings to the aDSP.
Thank you, Анатолий! :)