Publications on color document image analysis
present results on small, nonpublicly available
datasets. In this paper we propose a well-defined and
groundtruthed color dataset consisting of over 1000
pages, with associated tools for evaluation. As we focus
on aspects specific to color documents, we leave out
the document textual content in the ground truth. The
color data groundtruthing and evaluation tools are based
on a well-defined document model, complexity measures
to assess the inherent difficulty of analyzing a page, and
well-founded evaluation measures. Together they form a
suitable basis for evaluating diverse applications in color
document analysis. Both the dataset and the tools are
available through our Web site
@Article{TodoranIJDAR2005,
author = "Todoran, L. and Worring, M. and Smeulders, A. W. M.",
title = "The UvA Color Document Datatset",
journal = "International Journal of Document Analysis and Recognition",
number = "4",
volume = "7",
pages = "228--240",
year = "2005",
url = "https://ivi.fnwi.uva.nl/isis/publications/2005/TodoranIJDAR2005",
pdf = "https://ivi.fnwi.uva.nl/isis/publications/2005/TodoranIJDAR2005/TodoranIJDAR2005.pdf",
has_image = 1
}