Snapdragon and Qualcomm branded products are products of
Qualcomm Technologies, Inc. and/or its subsidiaries.
It’s quite amazing to think the first voice-controlled user interface, the Audrey system, was built by Bell Labs back in 1952 – although it only recognized ten digits spoken by a single voice and was six feet tall. Fast forward to today, and thanks to advances in machine learning (ML) and speech recognition, voice user interfaces (VUIs) such as Amazon’s Alexa, Microsoft’s Cortana, and Apple’s Siri are now an integral part of our daily lives. Today’s VUIs can converse in real-time and answer questions from the simple “What time is it?” to the more complex “Where is the nearest place I can go to update my driver’s license?” Used in everything from smartphones to smart homes, today’s VUIs are changing how we interact with technology and are bringing us closer to a voice-first era.
However, developing a high-performance custom VUI isn’t easy, particularly for embedded platforms in the product development phase. Sensory, a member of the Qualcomm® Advantage Network, figured it was time to change that. So they developed the VoiceHub platform, which is available through a convenient online portal that allows developers to quickly create ML models for wake words and voice control command sets for prototyping and proof-of-concept purposes.
In this blog, we’ll take a deeper look into Sensory, explore how VoiceHub works, run through an example of rapid VUI prototyping, and take a peek into what the future holds for VUI platforms.
Sensory, founded in 1994, is a technology development house that licenses embedded artificial intelligence (AI) to differentiate products by making them safer and easier to use. Sensory’s flexible wake word technology, small-to-large vocabulary speech recognition, and natural language processing (NLP) technologies are fueling today’s VUI revolution.
Sensory has pioneered neural network approaches for embedded speech recognition in consumer electronics. Additionally, its biometric recognition technologies are designed to make everything from unlocking a device to authenticating users for digital transactions faster, more secure, and more convenient.
Sensory's technologies are widely commercialized in areas including automotive, home appliances, home entertainment, IoT, mobile phones, and wearables. Their Trulyhandsfree and Sound ID are utilized on the Qualcomm® QCC5100 series, our low-power Bluetooth audio SoC series designed for compact, feature-rich wireless earbuds, headsets, and speakers.
How VoiceHub Works
Sensory’s VoiceHub empowers developers by providing them with tools to create custom wake word and voice command sets models for their applications. Todd Mozer, CEO of Sensory, believes offering VoiceHub free for developers will lead to innovation in the audio space. Since its release, VUI designers worldwide have used it to create voice AI models for automotive, wearable, smart speaker, and smart home products.
The great thing about VoiceHub is that a developer can create a project, train it, and have a deployable ML model within an hour of submitting it.
VoiceHub outputs wake word and voice command set models (powered by TrulyHandsfree). These models can then be used for quick prototyping by scanning a QRCode in a companion Android application or saved as deployable for more advanced proof-of-concept testing on specific DSPs.
These tools also offer vast flexibility, allowing developers to create wake word models, either custom-branded or based on today’s most popular voice assistant platforms, and command set models targeting the desired memory footprint. This makes it great for all applications, ranging from ultra-low power, resource-limited wearables to high-power, high-performance appliances on the edge.
The Challenge VoiceHub Solves
Sensory chose our advanced QCC5100 series SoCs, and we provided additional in-person support during the development of VoiceHub. As a result, developers can now implement a Sensory VUI and access TrulyHandsfree™, which is well renowned for high accuracy with ultra-low power. The developer simply selects the output format in a drop-down menu as an easy-to-use, no-code-required UI.
Rapid VUI Prototyping Playbook
To showcase just how straightforward prototyping has become when VoiceHub is used to develop VUIs, Sensory has created a tutorial video that walks developers through the process of using the online portal to create the wake word and command set in minutes.
The Future of VUIs
Easier VUI development for SoCs such as the QCC5100 series via VoiceHub allows for development possibilities such as multiple wake words controlling multiple voice assistants. For example, Bluetooth headsets and/or wireless speakers. Sensory sees an increase in VUI customized for brands and applications in the future, and they also believe that the ability to develop a multi-wake word-enabled headset or other small wireless devices could be crucial for manufacturers and developers of these devices.
It is estimated that the speech and voice recognition opportunity will be worth $24.9 billion by 2025, based largely on the growing impact of AI on the accuracy of speech and voice recognition. With the power of our QCC5100 series, VoiceHub’s wake word accuracy rate is high, nearing the performance of hand-tuned ML models.
The VoiceHub platform and its online developer portal allows for easy and convenient rapid prototyping, which are highly valued by VUI developers and prove why Sensory is recognized as a leader in this space.
Qualcomm QCC5100 and QCC5126 are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm Advantage Network is a program of Qualcomm Technologies, Inc. and/or its subsidiaries.