Gesture Recognition Dataset: Jester

Your model recognizes certain simple, single-frame gestures like a thumbs-up. But for a truly responsive, accurate system, you want your model to recognize complex gestures too, even when the differences between them are subtle. Is the person pointing to something or wagging their index finger? Is the hand cleaning the display or rubber-banding an image with two fingers? Given enough examples, your model can learn the difference.

The Jester gesture recognition dataset includes 148,092 labeled video clips of humans performing basic, pre-defined hand gestures in front of a laptop camera or webcam. It is designed for training machine learning models to recognize human hand gestures like sliding two fingers down, swiping left or right and drumming fingers.

The clips cover 27 different classes of human hand gestures, split in the ratio of 8:1:1 for training, development and testing. The dataset also includes two “no gesture” classes to help the network distinguish between specific gestures and unknown hand movements.

In the age of mobile computing, gesture/action recognition and its role in human-computer interfaces have grown in importance. The Jester video dataset allows the training of robust machine learning models to recognize human hand gestures.

Samples from the Jester dataset:

Sample classes

Doing other things
Rolling Hand Forward
Shaking Hand
Stop Sign
Swiping Left
Thumb Down
Thumb Up
Turning Hand Clockwise
Turning Hand Counterclockwise
Zooming Out With Full Hand
Zooming Out With Two Fingers

Dataset details

Total number of videos148,092
Training Set118,562
Validation Set14,787
Test Set (w/o labels)14,743
Labels27
Quality100px
FPS12

The Jester dataset was created with the help of more than 1,300 unique crowd actors.

Developers have successfully created classification models based on the training set and found that they perform well on the validation set. Running models on the test set, developers can achieve scores of up to 97 percent.

The video data is provided as one large TGZ archive, split into parts of 1 GB maximum. The total download size is 22.8 GB. The archive contains directories numbered from 1 to 148092. Each directory corresponds to one video and contains JPG images with a height of 100px and variable width. The JPG images were extracted from the original videos at 12 frames per seconds. The filenames of the JPGs start at 00001.jpg. The number of JPGs varies as the length of the original videos varies.

Dataset license

The Jester dataset is available for research purposes.

Labels

Dataset download

Please download ALL files, including the download instructions.

NOTE: Download speeds may be slower than usual due to increased traffic.

Citations

"The jester dataset: A large-scale video dataset of human gestures", J. Materzynska, G. Berger, I. Bax and R. Memisevic, IEEE/CVF (ICCVW) 2019.

Qualcomm AI Research

AI is shifting from simply seeing what is happening in front of the camera to understanding it. Data is the effective force behind these deep learning breakthroughs and is integral to the human-level performance of neural networks. Our crowd-acting approach to data collection overcomes the typical limitations of crowdsourcing, resulting in high-quality video data that is densely captioned, human-centric and diverse.

Qualcomm AI Research continues to invest in and support deep-learning research in computer vision. The publication of the Jester dataset for use by the AI research community is one of our many initiatives.

Find out more about Qualcomm AI Research.

For any questions or technical support, please contact us at [email protected]

Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.