| The MediaMill TRECVID 2005 Semantic Video Search Engine C. G. M. Snoek, J. C. van Gemert, J. M. Geusebroek, B. Huurnink, D. C. Koelma, G. P. Nguyen, O. de Rooij, F. J. Seinstra, A. W. M. Smeulders, C. J. Veenman, M. Worring In TRECVID Workshop 2005.
[bibtex] [pdf] [url] |
In this paper we describe our TRECVID 2005 experiments. The UvA-MediaMill team participated in four tasks. For the detection of camera work (runid: A\_CAM) we investigate the benefit of using a tessellation of detectors in combination with supervised learning over a standard approach using global image information. Experiments indicate that average precision results increase drastically, especially for pan (+51\%) and tilt (+28\%). For concept detection we propose a generic approach using our semantic pathfinder. Most important novelty compared to last years system is the improved visual analysis using proto-concepts based on Wiccest features. In addition, the path selection mechanism was extended. Based on the semantic pathfinder architecture we are currently able to detect an unprecedented lexicon of 101 semantic concepts in a generic fashion. We performed a large set of experiments (runid: B\_vA). The results show that an optimal strategy for generic multimedia analysis is one that learns from the training set on a per-concept basis which tactic to follow. Experiments also indicate that our visual analysis approach is highly promising. The lexicon of 101 semantic concepts forms the basis for our search experiments (runid: B\_2\_A-MM). We participated in automatic, manual (using only visual information), and interactive search. The lexicon-driven retrieval paradigm aids substantially in all search tasks. When coupled with interaction, exploiting several novel browsing schemes of our semantic video search engine, results are excellent. We obtain a top-3 result for 19 out of 24 search topics. In addition, we obtain the highest mean average precision of all search participants. We exploited the technology developed for the above tasks to explore the BBC rushes. Most intriguing result is that from the lexicon of 101 visual-only models trained for news data 25 concepts perform reasonably well on BBC data also.
@InProceedings{SnoekPTRECVID2005,
author = "Snoek, C. G. M. and van Gemert, J. C. and Geusebroek, J. M. and Huurnink, B.
and Koelma, D. C. and Nguyen, G. P. and de Rooij, O. and Seinstra, F. J.
and Smeulders, A. W. M. and Veenman, C. J. and Worring, M.",
title = "The MediaMill TRECVID 2005 Semantic Video Search Engine",
booktitle = "TRECVID Workshop",
year = "2005",
url = "https://ivi.fnwi.uva.nl/isis/publications/2005/SnoekPTRECVID2005",
pdf = "https://ivi.fnwi.uva.nl/isis/publications/2005/SnoekPTRECVID2005/SnoekPTRECVID2005.pdf",
has_image = 1
}