Forums - Voice(speech) Activity detection

3 posts / 0 new
Last post
Voice(speech) Activity detection
Join Date: 27 Jun 14
Posts: 7
Posted: Mon, 2014-10-13 06:05

Hi, I am working on the Voice related app. I want to detect the presence of Voice (speech) activity. I have tried open source projects like web-RTC, pockect spinx, Speex library and others, but the results are not satisfactory. Some will give better results in high SNR but fails to detect the same in low SNR. Is there any frame work form hexagon DSP form which we can check the captured/recorded audio (in Android MSM8974) is speech/voice or noise.? 

  • Up0
  • Down0
Join Date: 14 Oct 14
Posts: 3
Posted: Tue, 2014-10-14 23:08


There is no such framework for VAD for hexagon SDK users.



  • Up0
  • Down0
Join Date: 30 Jun 17
Posts: 2
Posted: Sat, 2017-07-01 00:15

Did you find information on how to specify a sound model?
Some Android devices support this feature, for example Google Pixel reacts to keyphrase "Ok Google". In the source code of the android, I found the code responsible for loading the keyphrase into the DSP processor (The Hexagon DSP processor is built into the Qualcomm processor):

The sound model description structure sound_trigger_sound_model pass in the method stdev_load_sound_model. Sound model structure:

 * Base sound model descriptor. This struct is the header of a larger block passed to
 * load_sound_model() and containing the binary data of the sound model.
 * Proprietary representation of users in binary data must match information indicated
 * by users field
struct sound_trigger_sound_model {

    sound_trigger_sound_model_type_t type;        /* model type. e.g. SOUND_MODEL_TYPE_KEYPHRASE */
    sound_trigger_uuid_t             uuid;        /* unique sound model ID. */
    sound_trigger_uuid_t             vendor_uuid; /* unique vendor ID. Identifies the engine the
                                              sound model was build for */
    unsigned int                     data_size;   /* size of opaque model data */
    unsigned int                     data_offset; /* offset of opaque data start from head of struct
                                                (e.g sizeof struct sound_trigger_sound_model) */

Does anyone know how to generate a binary data of the sound model or where to find information about it?

You can download the sound model for the keyphrase ''Ok Google" by link: I loaded it into the DSP processor and it works.

Helpfull Android classes:

  • Up0
  • Down0
or Register

Opinions expressed in the content posted here are the personal opinions of the original authors, and do not necessarily reflect those of Qualcomm Incorporated or its subsidiaries (“Qualcomm”). The content is provided for informational purposes only and is not meant to be an endorsement or representation by Qualcomm or any other party. This site may also provide links or references to non-Qualcomm sites and resources. Qualcomm makes no representations, warranties, or other commitments whatsoever about any non-Qualcomm sites or third-party resources that may be referenced, accessible from, or linked to this site.