Qualcomm Keyword Speech Dataset

Nowadays keyword spotting (KWS) is widely used for detecting specific keywords in personal devices like mobile phones and home appliances. A keyword may consist of multiple words, where “Hey Siri”, “Ok Google”, and “Hi Bixby” are well known examples.

Many keywords like these examples are branded by specific companies and the companies have shown great interests in KWS task for their own products. Various KWS approaches have been suggested by these companies, but they have exclusivity since they use their own keyword dataset that are not accessible to others. Therefore, the approaches are not reproducible by others and hard to compare between each other.

To handle the issue, here we publish a keyword dataset for our Qualcomm® Snapdragon™ mobile platform we have named ‘Hey Snapdragon Keyword Dataset’.

Download ZIP file

Download TAR file

This Data Set is licensed by Qualcomm Technologies, Inc. under the following terms (“License”). If you are using the Data Set on behalf of your employer or another legal entity, you agree to these terms on their behalf as well as on your own behalf, and you represent that you have the legal authority to bind such employer or other legal entity to these terms. If you do not have such authority or you or they do not agree to these License terms, you and such entity may not use this Data Set and must delete all copies of it.

Terms:

1. Subject to and conditioned upon your compliance with the terms and conditions of this License, QTI grants to you solely under QTI’s copyrights in the Data Set to use the Data Set solely for internal research purposes and specifically not for any commercial purposes (the “Limited Purpose”). You shall not, without QTI’s prior written authorization, (i) use the Data Set for commercial, production or revenue generating purposes, or (ii) incorporate the Data Set into any other data set, or (iii) use the Data Set to train any commercial product. Except for the Limited Purpose, you shall not use the Data Set for any other purpose.

2. Subject to and conditioned upon your compliance with the terms and conditions of this License, redistribution and use of the Data Set with or without modification, are permitted.

3. Redistributions must reproduce the following copyright notice and these License terms in the Data Set file and in any documentation and/or other materials provided with the distribution.

Copyright (c) 2019 Qualcomm Technologies, Inc.

All rights reserved.

4. Neither the name of Qualcomm Technologies, Inc. nor the names of any contributors may be used to endorse or promote products derived from this Data Set without specific prior written permission.

5. NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY THIS LICENSE. THIS DATA SET IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DATA SET, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

We present Hey Snapdragon Keyword Dataset that was used to support experimental results in our paper at ASRU 2019: Query-by-example on-device keyword spotting [1]. We hope that this new dataset will be helpful for reproducible KWS researches.

The dataset has 4,270 utterances of four English keywords spoken by 50 people. The four keywords are Hey Android, Hey Snapdragon, Hi Galaxy and Hi Lumina. The following table shows the details for each keyword. Each wav file has been recorded with the sampling rate of 16 kHz, mono channel, and 16 bits bit-depth.

The directory structure is ‘keyword/speaker_ID/utterance.wav’. Speaker ID is shared between keywords.

The dataset is intended for research purposes only. Please cite our paper [1] if you use this dataset in your research:

@misc{1910.05171,
Author = {Byeonggeun Kim and Mingu Lee and Jinkyu Lee and Yeonseok Kim and Kyuwoong Hwang},
Title = {Query-by-example on-device keyword spotting},
Year = {2019},
Eprint = {arXiv:1910.05171},
}

[1] Byeonggeun Kim, Mingu Lee, Jinkyu Lee, Yeonseok Kim, and Kyuwoong Hwang, “Query-by-example on-device keyword spotting,” to be published in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019), Sentosa, Singapore, Dec. 2019 to be published