MediaMill Datasets

MediaMill Tag Relevance Learning

The MediaMill Tag Relevance Learning software provides a solution to the problem of social-tagged images being uncontrolled, ambiguous, and overly personalized. Our software exploits the intuition that if different persons label visually similar images using the same tags, these tags are likely to reflect objective aspects of the visual content. Starting from this intuition, we propose a neighbor voting algorithm which accurately and efficiently learns tag relevance by accumulating votes from visual neighbors.


The software package contains:

  • Code (Windows) (~400KB): Support tag relevance learning and automated image annotation.
  • Training data (~420MB): 1.2 million socially tagged images as training data, and COLOR64 as visual feature.
  • Test data (~9MB): Ground truth for tag-based social image retrieval.

Construct a tag relevance learner

The TagrelLearner class implements the neighbor voting algorithm. Initializing a TagrelLearner instance requires three parameters: collection, name of the socially tagged image collection for neighbor voting, feature, which determines visual neighbors, and tpp, which specifies tag preprocessing techniques.

Assuming that the VisualSearch folder is placed at C:/VisualSearch/, set the variable ROOT_PATH in to "C:/VisualSearch/".

>>> from tagrellearner import TagrelLearner
>>> tagrel = TagrelLearner(collection="flickr1m", feature="color64", tpp="lemm")
>>> tagrel.set_nr_neighbors(1000) #use 1000 visual neighbors for voting

The collection "flickr1m" consists of 1.2 million socially tagged images, created in the TMM paper Harvesting Social Images for Bi-Concept Search. The feature "color64" is a compact 64-dim global feature (c.f. Mingjing Li, Texture Moment for Content-Based Image Retrieval, ICME 2007). Values of tpp can be "raw", "stem", or "lemm", which represent using raw tags, Porter-stemmed tags, and lemmatized tags, respectively.

Learn tag relevance for individual images in two steps

Given a socially tagged image , and its social tags, for instance, as:

>>> qry_tags = "humming bird yellow flower broad tailed nature wild wildlife colorado spring boulder rocky sigma public"

Step 1. extract the visual feature (currently COLOR64) from the image content (prerequisite: Python Imaging Library)

>>> from color64 import extractColor64
>>> qry_vec=extractColor64("testimages/3546946799.jpg")
>>> print " ".join(map(str,qry_vec))
0.256603866816 0.0778646543622 0.0402022078633 0.136827707291 0.0814508423209
0.169182538986 0.167500168085 0.198000192642 0.170579746366 0.0639530494809 
0.186014354229 0.0420079790056 0.144335106015 0.0430851057172 0.210729986429
0.115369670093 0.0579541511834 0.143578216434 0.168439865112 0.32083979249 
0.290238648653 0.0975492745638 0.187142521143 0.265635579824 0.351393431425 
0.316409766674 0.156695589423 0.127786517143 0.176223099232 0.0561240203679 
0.0 0.0 0.0 0.0 0.0444315187633 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
0.0809062495828 0.20950011909 0.220294803381 0.0990854352713 0.131724730134 
0.0877878144383 0.0838116928935 0.185533195734 0.531078636646 0.545228302479 
0.239922463894 0.321230560541 0.215354248881 0.195610255003 0.616132199764 
0.658942580223 0.369825750589 0.127457588911 0.118111990392 0.138596013188

Step 2. do tag relevance learning:

>>> tagvotes = tagrel.estimate(qry_vec, qry_tags, qry_userid="15180636@N03") #qry_userid is optional
>>> print " ".join(["%s %s" % (tag, vote) for (tag,vote) in tagvotes])
nature 206.268514487 flower 190.418889273 bird 127.771361458 wildlife 79.6980342304 
yellow 52.6216690106 spring 38.3188724227 wild 23.2443306657 sigma 3.99006521424 
broad 0.879047528482 boulder 0.368624428 humming -0.0517176085111 tailed -0.0984302871662 
colorado -0.769546336475 rocky -0.785773987378 public -2.51330894264

That is all it takes.

Suggestion for automated image annotation by neighbor voting:

>>> tagrel.set_nr_autotags(10) #set the number of tags to be predicted to 10
>>> tagvotes = tagrel.estimate(qry_vec, qry_tags="",qry_userid="")
>>> print " ".join(["%s %s" % (tag, vote) for (tag,vote) in tagvotes])
nature 207.268514487 flower 190.418889273 macro 181.733672668 insect 168.922877368 
bird 128.771361458 butterfly 97.9625381 green 85.6242448812 wildlife 80.6980342304 
plant 53.7878193354 yellow 53.6216690106

Learn tag relevance for a collection of images in two steps

To process a collection of socially tagged images, the data has to be organized the following way. Assuming that the VisualSearch folder is placed at C:/VisualSearch/, set the variable ROOT_PATH in to "C:/VisualSearch/". Given a test collection, say "test20", you need to prepare its (raw) tag file and place it as C:/VisualSearch/test20/TextData/id.userid.rawtags.txt. Each line of the tag file stores the metadata of an image, i.e., photoid, userid, and raw tags, separated by a tab character, see test20/TextData/id.userid.rawtags.txt for instance. We suggest apply the same tpp technique to your tag file beforehand, i.e., tag lemmatization or stemming:

$ python --tpp lemm --inputFile id.userid.rawtags.txt --outputFile id.userid.lemmtags.txt
$ python --tpp stem --inputFile id.userid.rawtags.txt --outputFile id.userid.stemtags.txt

Notice that requires the Natural Language Toolkit (NLTK).

Step 1. Feature extraction. Suppose the image data is stored at folder C:/VisualSearch/test20/ImageData (multiple folders are separated by semicolon):

$ python --imageFolders C:/VisualSearch/test20/ImageData --collection test20 --overwrite 1 
$ python --collection test20 --overwrite 1 

The script uses image filenames (without extension) as unique ids, which will be saved to C:/VisualSearch/test20/ImageSets/test20.txt.

Step 2. do tag relevance learning (using tags preprocessed with specific tpp techniques):

$ python --k 1000 --tpp raw  --testCollection test20 
$ python --k 1000 --tpp stem --testCollection test20 
$ python --k 1000 --tpp lemm --testCollection test20 

The result file will be saved to C:/VisualSearch/test20/tagrel/test20/flickr1m/color64,knn,1000,lemm/id.tagvotes.txt. Done!

Suggestions for working on subsets: suppose you want to do tag relevance learning only for a subset of a large collection, you can do so by creating a new file, say subset.txt, which contains photoids of the subset, at the ImageSets folder, and run:

$ python --k 1000 --tpp lemm  --testCollection test20 --testset subset

The result file will be saved to C:/VisualSearch/test20/tagrel/subset/flickr1m/color64,knn,1000,lemm/id.tagvotes.txt.

Suggestion for automated image annotation: Given a test collection, say "annotateImages", if no tag file is given at the TextData, the system will do automated image annotation:

$ python --k 1000 --r 100 --tpp lemm  --testCollection annotateImages

The result file will be saved to C:/VisualSearch/annotateImages/autotags/annotateImages/flickr1m/color64,knn,1000,lemm/id.tagvotes.txt.


If you have any question please contact Xirong Li at

Readme First
Xirong Li, Cees G. M. Snoek, and Marcel Worring. Learning Social Tag Relevance by Neighbor Voting. IEEE Transactions on Multimedia, vol. 11, iss. 7, pp. 1310-1322, 2009.