Moving Objects Dataset: Something-Something v. 2

Your model recognizes certain simple, single-frame gestures like a thumbs-up. But for a truly responsive, accurate system, you want your model to recognize gestures in the context of everyday objects. Is the person pointing to something or wagging their index finger? Is the hand cleaning the display or zooming in and out of an image with two fingers? Given enough examples, your model can learn the difference.

The Something-Something dataset (version 2) is a collection of 220,847 labeled video clips of humans performing pre-defined, basic actions with everyday objects. It is designed to train machine learning models in fine-grained understanding of human hand gestures like putting something into something, turning something upside down and covering something with something.

Samples from the Something-Something dataset:

Putting something on a surface
Moving something up
Covering something with something
Pushing something from left to right
Moving something down
Pushing something from right to left
Uncovering something
Taking one of many similar things on the table
Turning something upside down
Tearing something into two pieces
Putting something into something
Squeezing something
Throwing something
Putting something next to something
Poking something so lightly that it doesn't or almost doesn't move

Total number of videos	220,847
Training Set	168,913
Validation Set	24,777
Test Set (w/o labels)	27,157
Labels	174
Quality	100px
FPS	12

The dataset was created with the help of more than 1,300 unique crowd actors.

Developers like you have successfully created classification models based on the training set and found that they perform well on the validation set. Running their models on the test set, they can achieve scores of up to 91 percent.

The video data is provided as one large TGZ archive, split into parts of 1 GB maximum. The total download size is 19.4 GB. The archive contains webm-files, using the VP9 codec, with a height of 240px. Files are numbered from 1 to 220847.

For each video in the training and validation sets there is an object annotation in addition to the video label, if applicable. For example, for a label like "Putting [something] onto [something]," there is also an annotated version, such as "Putting a cup onto a table." In total, there are 318,572 annotations involving 30,408 unique objects.

To reduce label noise, five different crowd actors have verified that the action shown in each video matches the description given. The dataset contains only those videos in which all five crowd actors confirmed the match.

Something-Something is freely available for research purposes.

Data License Agreement - Research Use

Download (149.8 kb)

Updated 11 Aug 22

20BN-Something-Something Download Package Labels

Download (4.9 mb)

Updated 03 Dec 21

View License Agreement

You will need to be logged in with your Qualcomm OneID account in order to download. If you do not have an active account, please click the "Register" button at the top of the page to get started.

Please download ALL files including the download instructions. You will need to unzip your downloaded files separately, and then run the given command in the download instructions to extract all video files.

NOTE: Download speeds may be slower than usual due to increased traffic.

“The ‘something something’ video database for learning and evaluating visual common sense,” Goyal, R. et al., arXiv.org, June 15, 2017.

“On the effectiveness of task granularity for transfer learning,” Mahdisoltani, F. et al, arXiv.org, November 29, 2018.

AI is shifting from simply seeing what is happening in front of the camera to understanding it. Data is the effective force behind these deep learning breakthroughs and is integral to the human-level performance of neural networks. Our crowd-acting approach to data collection overcomes the typical limitations of crowdsourcing, resulting in high-quality video data that is densely captioned, human-centric and diverse.

Qualcomm AI Research continues to invest in and support deep-learning research in computer vision. The publication of the Jester dataset for use by the AI research community is one of our many initiatives.

Find out more about Qualcomm AI Research.

For any questions or technical support, please contact us at [email protected]

Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.

Moving Objects Dataset: Something-Something v. 2

Sample classes

Dataset details

Dataset license

Labels

Dataset download

Citations

Qualcomm AI Research

Moving Objects Dataset: Something-Something v. 2

Sample classes

Dataset details

Dataset license

Labels

Dataset download

Citations

Qualcomm AI Research

Sort By

Filter Results