Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video indexing have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. Therefore, instead of separately treating the different information sources involved, and their specific algorithms, we focus on the similarities and differences between the modalities. To that end we put forward a unifying and multimodal framework, which views a video document from the perspective of its author. This framework forms the guiding principle for identifying index types, for which automatic methods are found in literature. It furthermore forms the basis for categorizing these different methods.
@Article{SnoekMTA2005,
author = "Snoek, C. G. M. and Worring, M.",
title = "Multimodal Video Indexing: A Review of the State-of-the-Art",
journal = "Multimedia Tools and Applications",
number = "1",
volume = "25",
pages = "5--35",
year = "2005",
url = "https://ivi.fnwi.uva.nl/isis/publications/2005/SnoekMTA2005",
pdf = "https://ivi.fnwi.uva.nl/isis/publications/2005/SnoekMTA2005/SnoekMTA2005.pdf",
has_image = 1
}