The problem of event representation for automatic event detection in Internet videos is acquiring an increasing importance, due to their applicability to a large number of applications. Existing methods focus on representing events in terms of either low-level descriptors or domain-specic models suited for a limited class of video only, ignoring the high-level meaning of the events. Ultimately aiming for a more robust and meaningful representation, in this paper we question whether object detectors can aid video event retrieval. We propose an experimental study that investigates the utility of present-day local and global object detectors for video event search. By evaluating object detectors optimized for high-quality photographs on low-quality Internet video, we establish that present-day detectors can successfully be used for recognizing objects in web videos. We use an object-based representation to re-rank the results of an appearance-based event detector. Results on the challenging TRECVID multimedia event detection corpus demonstrate that objects can indeed aid event retrieval. While much remains to be studied, we believe that our experimental study is a rst step towards revealing the potential of object-based event representations.
@InProceedings{ModoloSEI2013,
author = "Modolo, D. and Snoek, C. G. M.",
title = "Can Object Detectors Aid Internet Video Event Retrieval?",
booktitle = "IS\&T/SPIE Symposium on Electronic Imaging",
year = "2013",
url = "https://ivi.fnwi.uva.nl/isis/publications/2013/ModoloSEI2013",
pdf = "https://ivi.fnwi.uva.nl/isis/publications/2013/ModoloSEI2013/ModoloSEI2013.pdf",
has_image = 1
}