In this technical demonstration, we showcase a multimedia search engine that retrieves a video from a sentence, or a sentence from a video. The key novelty is our machine translation capability that exploits a cross-media representation for both the visual and textual modality using concept vocabularies. We will demonstrate the translations using arbitrary web videos and sentences related to everyday events. What is more, we will provide an automatically generated explanation, in terms of concept detectors, on why a particular video or sentence has been retrieved as the most likely translation.
@InProceedings{HabibianICM2013,
author = "Habibian, A. and Snoek, C. G. M.",
title = "Video2Sentence and Vice Versa",
booktitle = "ACM International Conference on Multimedia",
year = "2013",
url = "https://ivi.fnwi.uva.nl/isis/publications/2013/HabibianICM2013",
pdf = "https://ivi.fnwi.uva.nl/isis/publications/2013/HabibianICM2013/HabibianICM2013.pdf",
has_image = 1
}