MediaMill Datasets

MediaMill Tag Relevance

The MediaMill Tag Relevance dataset provides all the data you need to perform tag relevance for social-tagged images. Tag relevance exploits the intuition that if different persons label visually similar images using the same tags, these tags are likely to reflect objective aspects of the visual content. Starting from this intuition, we propose a neighbor voting algorithm which accurately and efficiently learns tag relevance by accumulating votes from visual neighbors.

Estimating Tag Relevance by Neighbor Voting

Tag Relevance dataset

The MediaMill Tag Relevance dataset contains:

  • Flickr3.5M images: A collection of 3.5 million social-tagged image ids from Flickr. You can use the file to obtain image source urls and download original images (see get-images/
  • Flickr3.5M meta: MetaData for the 3.5 million images, including user tags, titles, descriptions, geo information, etc.
  • Social20: Groundtruth for 20 visual categories for social-tagged image retrieval.

Dataset Statistics

No. of images~3,500,000
No. of unique tags~570,000
No. of unique user-ids~270,000
Proportion of images with faces detected by OpenCV~18%

Most frequent tags in the Flickr3.5M collection.


Tag Relevance - Flickr3.5M images Tag Relevance - Flickr3.5M MetaData Tag Relevance - Social20


If you have any question please contact Xirong Li at

Readme First
Xirong Li, Cees G. M. Snoek, and Marcel Worring. Learning Social Tag Relevance by Neighbor Voting. IEEE Transactions on Multimedia, vol. 11, iss. 7, pp. 1310-1322, 2009.