A Study of the Variability of Very Low Resolution Characters and the Feasibility of Their Discrimination Using Geometrical Features

Current OCR technology does not allow to accurately recognizing small text images, such as those found in web images. Our goal is to investigate new approaches to recognize very low resolution text images containing antialiased character shapes. This paper presents a preliminary study on the variability of such characters and the feasibility to discriminate them by using geometrical features. In a first stage we analyze the distribution of these features. In a second stage we present a study on the discriminative power for recognizing isolated characters, using various rendering methods and font properties. Finally we present interesting results of our evaluation tests leading to our conclusion and future focus.




References:
[1] A. Antonacopoulos, D. Karatzas nad J.O.Lopetz "Accessing Textual
Information Embedded in Internet Images", Proceedings of Electronic
Imaging, Jan. 2001, Internet Imaging II, San Jose, California.
[2] D. Amor, The E-Business (R)evolution, Prentice Hall, 1999.
[3] E.V. Munson, Y. Tsymbalenko, "Using HTML Metadata to Find
Relevant Images on the Web", Proceedings of Internet Computing 2001,
Volume II, Las Vegas, pages 842-848, CSREA Press, June 2001.
[4] G. Nagy, "Twenty Years of Document Image Analysis in PAMI", IEEE
Transactions on Pattern Analysis and Machine Intelligence", 1999125.
[5] D. Lopresti, J. Zhou, "Document Analysis and the World Wide Web",
International Association for Pattern Recognition, Workshop on
Document Analysis Systems, 1996, pp 651-671.
[6] J. Zhou, D. Lopresti, "Extracting Text from WWW Images", Proceedings
of the 4th ICDAR, 1997, pp 248-252.
[7] J. Zhou, D. Lopresti, "OCR for World Wide Web Images", Proceedings
of SPIE on Document Recognition IV, 1997, pp 58-66.
[8] D. Lopresti, J. Zhou, "Locating and Recognizing Text in WWW Images",
Information Retrieval 2,, 2000, pp 177-206.
[9] A. Antonacopoulos, D. Karatzas, "An Anthropocentric Approach to Text
Extraction from WWW Images", IAPR Rio de Janiero, 2000.
[10] A. Antonacopoulos, D. Kartazas, "Text Extraction from Web Images
Based on Human Perception and Fuzzy Inference", Document Analysis
Systems V: 5th International Workshop, DAS 2002, Princeton, NY,
USA, August 19-21, 2002.
[11] A. Antonacopoulos, D. Karatzas, "Text Extraction from Web Images
Based on a Split-and-Merge Segmentation Method Using Color
Perception", Proceedings of the 17th International Conference on
Pattern Recognition (ICPR2004), Cambridge, UK, August 23-26, 2004,
IEEE-CS Press.
[12] A. Zramdini and R. Ingold, "Optical Font Recognition Using
Typographical Features". IEEE Transactions on Pattern Analysis and
Machine Intelligence (PAMI), August 1998.