MediaMill Datasets

MediaMill Bi-concepts

The MediaMill Bi-concept dataset provides a new baseline for image retrieval beyond single concepts. Searching for the co-cooccurrence of two visual concepts in unlabeled images is an important step towards answering complex user queries. While traditional methods count on artificial combinations of individual single-concept detectors, bi-concept search is a new concept-based retrieval method, equipped with bi-concept detectors directly. Bi-concept search is found to be superior to oracle linear fusion of single-concept based search.


The MediaMill Bi-concept dataset contains:

  • Ground truth for 15 bi-concepts and 1 tri-concept. Each bi-concept has 50 positive test examples, and 10,000 negative test examples. See the folder Annotations.
  • Bi-concept image search results, retrieved by the three systems, i.e., social, borda, and full, with varying configurations as described in the bi-concept paper.
  • Python code for comparing your system with the baseline.

Download (25MB): all in one package.

To compare your system with the baselines in two steps

Step 1. For each bi-concept w, first use your system, say 'systemX', to score the 50 positive test examples and the 10,000 negative test examples, and sort them. Save the sorted results to SimilarityIndex/test/systemX/w.txt. The file w.txt shall contain 10,050 lines, where each line starts with a photo id followed by the corresponding score.

Step 2. Assuming that the biconcepts2012test folder is placed at C:/VisualSearch/, set the variable ROOT_PATH in to "C:/VisualSearch/". Add 'systemX' to the variable rankerNameList in pycode/, and run the python script to compute Average Precision scores of the individual systems.


If you have any question please contact Xirong Li at

Readme First
Xirong Li, Cees G. M. Snoek, Marcel Worring, and Arnold W. M. Smeulders. Harvesting Social Images for Bi-Concept Search. IEEE Transactions on Multimedia, vol. 14, iss. 4, pp. 1091-1104, 2012.