This paper studies automatic image classification
by modeling soft-assignment in the popular codebook model.
The codebook model describes an image as a bag of discrete
visual words selected from a vocabulary, where the frequency
distributions of visual words in an image allow classification.
One inherent component of the codebook model is the assignment
of discrete visual words to continuous image features.
Despite the clear mismatch of this hard assignment with the
nature of continuous features, the approach has been applied
successfully for some years. In this paper we investigate four
types of soft-assignment of visual words to image features. We
demonstrate that explicitly modeling visual word assignment
ambiguity improves classification performance compared to the
hard-assignment of the traditional codebook model. The traditional codebook model is compared against our method for five
well-known datasets: 15 natural scenes, Caltech-101, Caltech-256, and Pascal VOC 2007/2008. We demonstrate that large codebook vocabulary sizes completely deteriorate the performance of the traditional model, whereas the proposed model performs consistently. Moreover, we show that our method profits in high-dimensional feature spaces and reaps higher benefits
when increasing the number of image categories.
@Article{vanGemertTPAMI2010,
author = "van Gemert, J. C. and Veenman, C. J. and Smeulders, A. W. M. and Geusebroek, J. M.",
title = "Visual Word Ambiguity",
journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
number = "7",
volume = "32",
pages = "1271--1283",
year = "2010",
url = "https://ivi.fnwi.uva.nl/isis/publications/2010/vanGemertTPAMI2010",
pdf = "https://ivi.fnwi.uva.nl/isis/publications/2010/vanGemertTPAMI2010/vanGemertTPAMI2010.pdf",
has_image = 1
}