MediaMill Datasets and Software

MediaMill Concept Vocabulary

The MediaMill Concept Vocabulary provides a high level representation for a subset of the HAVIC corpus from the TRECVID multimedia event detection task. The vocabulary covers a wide range of concepts from various categories, including objects, actions, scences, humans and animals. The diversity and size of the vocabulary makes it suitable for studying high level event representations.

Concept Vocabulary

The MediaMill Concept Vocabulary contains 1,346 concept detectors based on on two public data sets:

346 concept detectors trained based on TRECVID 2012 Semantic Indexing task. The detectors are trained on 400,000 annotated keyframes using the MediaMill 2012 concept detection system.
1,000 concept detectors trained based on ImageNet LargeScale Visual Recognition Challenge 2011. The detectors are trained on 1,300,000 annotated keyframes using the MediaMill 2012 concept detection system.

The concept names are specified in the download materials. The following figure shows some examples of the concepts in our vocabulary.

Dataset Content

The dataset is based on 13,049 videos from the TRECVID multimedia event detection corpus, available from LDC. We divided the videos into a train and validation set. The former contains 8,824 videos and the latter 4,425 videos. The videos are represented by applying the vocabulary concepts on their extracted frames (1 frame per 2 seconds), which are then aggregated into video level.

train.txt: This file includes the high level representation of 8,824 videos as training set. Each row represents a video and is made of 1,347 tokens separated by space character. The first token is the video name and the other 1,346 tokens are the concept detector outputs. The detector outputs are normalized between 0-1 and can be treated as the probabilities that the concepts appear in the video.
val.txt: This file includes the high level representation of 4,425 videos as test set. Each row represents a video and is made of 1,347 tokens which are separated by a space character. The first token is the video name and the other 1,346 tokens are the concept detector outputs. The detector outputs are normalized between 0-1 and can be treated as the probabilities that the concepts appear in the video.
concept names: This file includes the names of the 1,346 concepts in the same order as the train.txt and val.txt columns. The first 346 concepts are from TRECVID 2012 Semantic Indexing task and the last 1,000 concepts are from ImageNet LargeScale Visual Recognition Challenge 2011.

Download

The MediaMill Concept Vocabulary is available for download here.

Contact

If you have any question please contact Amirhossein Habibian at a.habibian@uva.nl.

Readme First
Amirhossein Habibian, Koen E.A. van de Sande, and Cees G.M. Snoek. Recommendations for Video Event Recognition Using Concept Vocabularies. In Proceedings of the ACM International Conference on Multimedia Retrieval, pp. 89-96, Dallas, USA, April 2013.