Hello,
I just found out about QSML and am trying to replace our current Eigen based implementation with QSML in our Android app. We're currently only interested in a single function - cblas_sgemm(), and was wondering how to enable QSML's internal parallellism. This function is the core of running a convolutional neural network and its performance is highly critical for us.
Thanks,
Ofri
Thanks Matthew for the quick response. Very helpful. Now I have two more questions:
1. I'm testing the library on my Nexus 6P (I believe it's a Snapdragon 810) and I'm observing extreme fluctuations in running times. The previous code which was based on Eigen (we used serial sgemm with our custom parallelization) was taking about 9.5 seconds to complete. This was happening most of the time when occasionally it'd take ~10.5 or ~8.5 seconds. So the current code is running at 9.5 sec +/- 1sec.
Now when I replaced the sgemm function from Eigen to QSML (letting QSML do its own parallelization), I'm getting anything between 4 sec to 10.7 sec. The algorithm is being run many times and the actual running time varies greatly. The initial run is almost always quite fast (around 5.5 to 6.5 seconds), where future execution is all over the scale, though it tends to slow down. Strangely, the Eigen based code was almost always taking 9.5 sec. Stopping execution of the algorithm for a short time then starting again repeats the same results - first run is super fast then things slow down. What could explain such behavior?
2. Since QSML is compatible only with Snapdragon processors, do you guys have a best practice for detecting compatible processors at runtime? We'd like to use QSML on compatible devices but fall back to Eigen for others.
Thanks again,
Ofri
Hey thanks for tis valuabke reply