Qualcomm Keyword Speech Dataset

Nowadays keyword spotting (KWS) is widely used for detecting specific keywords in personal devices like mobile phones and home appliances. A keyword may consist of multiple words, where “Hey Siri”, “Ok Google”, and “Hi Bixby” are well known examples.

Many keywords like these examples are branded by specific companies and the companies have shown great interests in KWS task for their own products. Various KWS approaches have been suggested by these companies, but they have exclusivity since they use their own keyword dataset that are not accessible to others. Therefore, the approaches are not reproducible by others and hard to compare between each other.

To handle the issue, here we publish a keyword dataset for our Qualcomm® Snapdragon™ mobile platform we have named ‘Hey Snapdragon Keyword Dataset’.

Download ZIP file

Download TAR file

We present Hey Snapdragon Keyword Dataset that was used to support experimental results in our paper at ASRU 2019: Query-by-example on-device keyword spotting [1]. We hope that this new dataset will be helpful for reproducible KWS researches.

The dataset has 4,270 utterances of four English keywords spoken by 50 people. The four keywords are Hey Android, Hey Snapdragon, Hi Galaxy and Hi Lumina. The following table shows the details for each keyword. Each wav file has been recorded with the sampling rate of 16 kHz, mono channel, and 16 bits bit-depth.

The directory structure is ‘keyword/speaker_ID/utterance.wav’. Speaker ID is shared between keywords.

The dataset is intended for research purposes only. Please cite our paper [1] if you use this dataset in your research:

Author = {Byeonggeun Kim and Mingu Lee and Jinkyu Lee and Yeonseok Kim and Kyuwoong Hwang},
Title = {Query-by-example on-device keyword spotting},
Year = {2019},
Eprint = {arXiv:1910.05171},

[1] Byeonggeun Kim, Mingu Lee, Jinkyu Lee, Yeonseok Kim, and Kyuwoong Hwang, “Query-by-example on-device keyword spotting,” to be published in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019), Sentosa, Singapore, Dec. 2019 to be published