In this paper we describe our TRECVID 2012 video retrieval experiments. The MediaMill team participated in four tasks: semantic indexing, multimedia event detection, multimedia event recounting and instance search. The starting point for the MediaMill detection approach is our top-performing bag-of-words system of TRECVID 2008-2011, which uses multiple color SIFT descriptors, averaged and difference coded into codebooks with spatial pyramids, and kernel-based machine learning. This year our concept detection experiments focus on establishing the influence of difference coding, the use of audio features, concept-pair detection using regular concepts, pair detection by spatiotemporal objects, and concept(-pair) detection without annotations. Our event detection and recounting experiments focus on representations using concept detectors. For instance search we study the influence of spatial verification and color invariance. The 2012 edition of the TRECVID benchmark has again been a fruitful participation for the MediaMill team, resulting in the runner-up ranking for concept detection in the semantic indexing task.
@InProceedings{SnoekPTRECVID2012a,
author = "Snoek, C. G. M. and van de Sande, K. E. A. and Habibian, A. and Kordumova, S.
and Li, Z. and Mazloom, M. and Pintea, S. L. and Tao, R.
and Koelma, D. C. and Smeulders, A. W. M.",
title = "The MediaMill TRECVID 2012 Semantic Video Search Engine",
booktitle = "TRECVID Workshop",
year = "2012",
url = "https://ivi.fnwi.uva.nl/isis/publications/2012/SnoekPTRECVID2012a",
pdf = "https://ivi.fnwi.uva.nl/isis/publications/2012/SnoekPTRECVID2012a/SnoekPTRECVID2012a.pdf"
}