A Study of the Variability of Very Low Resolution Characters and the Feasibility of Their Discrimination Using Geometrical Features
Current OCR technology does not allow to
accurately recognizing small text images, such as those found
in web images. Our goal is to investigate new approaches to
recognize very low resolution text images containing antialiased
character shapes.
This paper presents a preliminary study on the variability of
such characters and the feasibility to discriminate them by
using geometrical features. In a first stage we analyze the
distribution of these features. In a second stage we present a
study on the discriminative power for recognizing isolated
characters, using various rendering methods and font
properties. Finally we present interesting results of our
evaluation tests leading to our conclusion and future focus.
[1] A. Antonacopoulos, D. Karatzas nad J.O.Lopetz "Accessing Textual
Information Embedded in Internet Images", Proceedings of Electronic
Imaging, Jan. 2001, Internet Imaging II, San Jose, California.
[2] D. Amor, The E-Business (R)evolution, Prentice Hall, 1999.
[3] E.V. Munson, Y. Tsymbalenko, "Using HTML Metadata to Find
Relevant Images on the Web", Proceedings of Internet Computing 2001,
Volume II, Las Vegas, pages 842-848, CSREA Press, June 2001.
[4] G. Nagy, "Twenty Years of Document Image Analysis in PAMI", IEEE
Transactions on Pattern Analysis and Machine Intelligence", 1999125.
[5] D. Lopresti, J. Zhou, "Document Analysis and the World Wide Web",
International Association for Pattern Recognition, Workshop on
Document Analysis Systems, 1996, pp 651-671.
[6] J. Zhou, D. Lopresti, "Extracting Text from WWW Images", Proceedings
of the 4th ICDAR, 1997, pp 248-252.
[7] J. Zhou, D. Lopresti, "OCR for World Wide Web Images", Proceedings
of SPIE on Document Recognition IV, 1997, pp 58-66.
[8] D. Lopresti, J. Zhou, "Locating and Recognizing Text in WWW Images",
Information Retrieval 2,, 2000, pp 177-206.
[9] A. Antonacopoulos, D. Karatzas, "An Anthropocentric Approach to Text
Extraction from WWW Images", IAPR Rio de Janiero, 2000.
[10] A. Antonacopoulos, D. Kartazas, "Text Extraction from Web Images
Based on Human Perception and Fuzzy Inference", Document Analysis
Systems V: 5th International Workshop, DAS 2002, Princeton, NY,
USA, August 19-21, 2002.
[11] A. Antonacopoulos, D. Karatzas, "Text Extraction from Web Images
Based on a Split-and-Merge Segmentation Method Using Color
Perception", Proceedings of the 17th International Conference on
Pattern Recognition (ICPR2004), Cambridge, UK, August 23-26, 2004,
IEEE-CS Press.
[12] A. Zramdini and R. Ingold, "Optical Font Recognition Using
Typographical Features". IEEE Transactions on Pattern Analysis and
Machine Intelligence (PAMI), August 1998.
[1] A. Antonacopoulos, D. Karatzas nad J.O.Lopetz "Accessing Textual
Information Embedded in Internet Images", Proceedings of Electronic
Imaging, Jan. 2001, Internet Imaging II, San Jose, California.
[2] D. Amor, The E-Business (R)evolution, Prentice Hall, 1999.
[3] E.V. Munson, Y. Tsymbalenko, "Using HTML Metadata to Find
Relevant Images on the Web", Proceedings of Internet Computing 2001,
Volume II, Las Vegas, pages 842-848, CSREA Press, June 2001.
[4] G. Nagy, "Twenty Years of Document Image Analysis in PAMI", IEEE
Transactions on Pattern Analysis and Machine Intelligence", 1999125.
[5] D. Lopresti, J. Zhou, "Document Analysis and the World Wide Web",
International Association for Pattern Recognition, Workshop on
Document Analysis Systems, 1996, pp 651-671.
[6] J. Zhou, D. Lopresti, "Extracting Text from WWW Images", Proceedings
of the 4th ICDAR, 1997, pp 248-252.
[7] J. Zhou, D. Lopresti, "OCR for World Wide Web Images", Proceedings
of SPIE on Document Recognition IV, 1997, pp 58-66.
[8] D. Lopresti, J. Zhou, "Locating and Recognizing Text in WWW Images",
Information Retrieval 2,, 2000, pp 177-206.
[9] A. Antonacopoulos, D. Karatzas, "An Anthropocentric Approach to Text
Extraction from WWW Images", IAPR Rio de Janiero, 2000.
[10] A. Antonacopoulos, D. Kartazas, "Text Extraction from Web Images
Based on Human Perception and Fuzzy Inference", Document Analysis
Systems V: 5th International Workshop, DAS 2002, Princeton, NY,
USA, August 19-21, 2002.
[11] A. Antonacopoulos, D. Karatzas, "Text Extraction from Web Images
Based on a Split-and-Merge Segmentation Method Using Color
Perception", Proceedings of the 17th International Conference on
Pattern Recognition (ICPR2004), Cambridge, UK, August 23-26, 2004,
IEEE-CS Press.
[12] A. Zramdini and R. Ingold, "Optical Font Recognition Using
Typographical Features". IEEE Transactions on Pattern Analysis and
Machine Intelligence (PAMI), August 1998.
@article{"International Journal of Information, Control and Computer Sciences:57640", author = "Farshideh Einsele and Rolf Ingold", title = "A Study of the Variability of Very Low Resolution Characters and the Feasibility of Their Discrimination Using Geometrical Features", abstract = "Current OCR technology does not allow to
accurately recognizing small text images, such as those found
in web images. Our goal is to investigate new approaches to
recognize very low resolution text images containing antialiased
character shapes.
This paper presents a preliminary study on the variability of
such characters and the feasibility to discriminate them by
using geometrical features. In a first stage we analyze the
distribution of these features. In a second stage we present a
study on the discriminative power for recognizing isolated
characters, using various rendering methods and font
properties. Finally we present interesting results of our
evaluation tests leading to our conclusion and future focus.", keywords = "World Wide Web, document analysis, pattern
recognition, Optical Character Recognition.", volume = "1", number = "6", pages = "1692-4", }