Visual-Concept Search Solved?

Research has reached the point where one part of the community suggests visual search is practically solved and progress has only been incremental, while another part argues that current solutions are weak and generalize poorly. We've done an experiment to shed light on the issue.

Experiment

We assessed image-categorization progress by comparing a state-of-the-art search engine from 2006 with one from 2009.

We used four mixtures of two broadcast video data sets obtained from the 2005 and 2007 TRECVID video-retrieval benchmarks. The first data set was from the MediaMill Challenge and included 85 hours of shot-segmented news video from China, Lebanon, and the US; the second was the training set of the TRECVID 2007 benchmark and contained 56 hours of shot-segmented Dutch documentary video. We separated both video data sets into an independent training (70 percent) and test (30 percent) set.

In our experiment, we used the two search engines to detect the most common visual concepts in the literature, namely, the 36 defined in LSCOM, as labeled manually for both data sets. We took into account both the situation where the training set data was visually similar to the testing set, and that where the training set data visually differed from the set used for testing.

Results

Visual-Concept Search Solved?
Visual-search progress as evaluated on 36 concept detectors (·) derived from broadcast video data using state-of-the-art search engines from 2006 and 2009. The figure highlights performance for three typical concepts. The top of the skewed bar indicates the maximum average performance by training on similar examples, and the bottom indicates the minimum performance when training on a data set of completely different origin. A mean average precision score of 0.25 (dotted line) is generally accepted to be sufficient for interactive search. The horizontal dashed line represents Google's text-search performance. Contrary to belief in the community, progress in visual search is substantial and visual-concept search is quickly maturing in robustness for real-world usage of any concept.

As the above Figure shows, search engine performance doubled in just three years. For learned concepts, detection rates degenerated when applied to data of a different origin yet still doubled in three years. Thus, contrary to the widespread belief that visual-search progress is incremental and detectors generalize poorly, our experiment shows that progress is substantial on both counts.

Readme First
Cees G.M. Snoek and Arnold W.M. Smeulders. Visual-Concept Search Solved?. IEEE Computer, 43(6):76-78, 2010.