In this paper, we review 300 references on video retrieval, indicating when text-only solutions are unsatisfactory and showing the promising alternatives which are in majority concept-based. Therefore, central to our discussion is the notion of a semantic concept: an objective linguistic description of an observable entity. Specifically, we present our view on how its automated detection, selection under uncertainty, and interactive usage might solve the major scientific problem for video retrieval: the semantic gap. To bridge the gap, we lay down the anatomy of a concept-based video search engine. We present a component-wise decomposition of such an interdisciplinary multimedia system, covering influences from information retrieval, computer vision, machine learning, and human-computer interaction. For each of the components we review state-of-the-art solutions in the literature, each having different characteristics and merits. Because of these differences, we cannot understand the progress in video retrieval without serious evaluation efforts such as carried out in the NIST TRECVID benchmark. We discuss its data, tasks, results, and the many derived community initiatives in creating annotations and baselines for repeatable experiments. We conclude with our perspective on future challenges and opportunities.
A printed and bound version of this article is available at a 50% discount from Now Publishers.
This can be obtained by entering the promotional code
INR002004 on the
order form at now publishers.
@Article{SnoekFTIR2009,
author = "Snoek, C. G. M. and Worring, M.",
title = "Concept-Based Video Retrieval",
journal = "Foundations and Trends in Information Retrieval",
number = "2",
volume = "4",
pages = "215--322",
year = "2009",
url = "https://ivi.fnwi.uva.nl/isis/publications/2009/SnoekFTIR2009",
pdf = "https://ivi.fnwi.uva.nl/isis/publications/2009/SnoekFTIR2009/SnoekFTIR2009.pdf",
has_image = 1
}