Using the Qualcomm® Hexagon™ SDK for On-Device Speech Processing

Thursday 10/15/15 12:12pm
|
Posted By Kuntal Sampat
  • Up1
  • Down0

Ever since Qualcomm Technologies first announced always-on device activation in early 2013, we have seen an industry-wide adoption of always-on voice services. These services rely on waking up the device using a preset phrase such as “Hey, Snapdragon,” and then sending raw speech to the cloud, where actual recognition is done and the response formulated.

This raises a number of legitimate privacy questions:

  1. How long is the voice recording stored in the cloud?
  2. Is the recorded voice available to a human listener, such as a software developer, who needs to use it for legitimate work on the application?
  3. Are there sufficient safeguards to prevent somebody from tracing the recorded voice back to a specific user?
  4. Apart from the user saying the wake-up phrase, is any background audio also stored in the cloud and how is such data used?
  5. Is the voice data encrypted in the cloud?
Man in blue shirt using speech recognition with his mobile device on the beach

Several companies have been trying to respond to these and similar questions of privacy. I believe that performing speech recognition and response formation on the device instead of running them in the cloud is an option toward mitigating privacy and security concerns.

The Qualcomm® Hexagon™ DSP SDK is designed to allow OEMs and ISVs to write fully capable, low-power speech recognition and synthesis applications that obviate the need to send voice samples to the cloud. Registered developers for the Hexagon SDK also get access to a text-to-speech framework to accelerate application development. This opportunity provides OEMs and ISVs with the option to perform some speech recognition on the device, and not need the cloud at all.

Some of the most common queries can be answered by information available on the device itself, without the need for network connectivity. For instance, queries about personal schedule can be answered by accessing the on-device calendar. The Hexagon SDK also provides capabilities to change many spoken queries into text before sending them over the network. This provides a privacy benefit of not collecting a recorded voice in the cloud.

Thoughts? Let me know what you think in the comments below.

Comments

Re: Using the Qualcomm® Hexagon™ SDK for On-Device Speech...

Can you program the DSP on any smartphone with the proper snapdragon chipset, or do you need to buy the dev kit? Will the apps relying on this technology only work if an OEM includes my code on their devicesI am a researcher working on a project, and am interested on the hexagon SDK to test novel machine learning algorithms. I have requested access to the SDK, but maybe I do not have the profile required to develop on this platform.

Re: Using the Qualcomm® Hexagon™ SDK for On-Device Speech...

I know it has been 1yr since your question, but I think someone else might be interested, like me, as I am investigating the possibility as well.

I believe you can build your code with the SDK and build your own app that runs the code through the DSP, but if you need a platform you can always use open-source AOSP to build your version of Android or the easy was is to put your code into 3rd party like CyanogenMod (you can build it on your own away from the official channel if you're concerned about keeping your code on lock for testing)...

This is IMO for now as I'm still researching this topic and the available options for individual / non-OEM development!