A Study of Touching Characters in Degraded Gurmukhi Text

Character segmentation is an important preprocessing step for text recognition. In degraded documents, existence of touching characters decreases recognition rate drastically, for any optical character recognition (OCR) system. In this paper a study of touching Gurmukhi characters is carried out and these characters have been divided into various categories after a careful analysis.Structural properties of the Gurmukhi characters are used for defining the categories. New algorithms have been proposed to segment the touching characters in middle zone. These algorithms have shown a reasonable improvement in segmenting the touching characters in degraded Gurmukhi script. The algorithms proposed in this paper are applicable only to machine printed text.




References:
[1] Y. Lu, "Machine Printed Character Segmentation - an Overview",
Pattern Recognition, vol. 29, no. 1, pp. 67-80, 1995
[2] S.Kahan, T.Pavlidis, and H.S.Baird, " on the recognition of printed
characters of any fonts and sizes", IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 9, no. 2, pp. 274-288, Mar. 1987
[3] S. Liang, M. Sridhar and M. Ahmadi, "Segmentation of Touching
Characters in Printed Document Recognition," Pattern Recognition,
vol. 27, no. 6, pp 825-840, June 1994
[4] G. S .Lehal and Chandan Singh, "Text segmentation of machine printed
Gurmukhi script", Document Recognition and Retrieval VIII,
Proceedings SPIE, USA, vol. 4307, pp. 223-231, 2001.
[5] G.S.Lehal and Chandan Singh, "A technique for segmentation of
Gurmukhi script", Computer Analysis of Images and Patterns,
Proceedings CAIP 2001, Warsaw, Poland, Lecture Notes in Computer
Science, vol. 2127 Springer-Verlag, pp. 191-200, 2001.
[6] Veena Bansal and R.M.K. Sinha , "Segmentation of touching characters
in Devanagari," in Indian Conference on Computer Vision, Graphics
and Image Processing, New Delhi: pp 377-380(1998)
[7] U. Garain, B.B. Chaudhuri, "Segmentation of touching characters in
printed Devanagari and Bangla scripts using fuzzy multifactorial
analysis", IEEE Trans. Systems Man Cybern. Part C-32 (2002) 449-
459.
[8] U. Garain, B.B. Chaudhuri, "On recognition of touching characters in
printed Bangla Documents", Proceedings of the Fourth International
Conference on Document Analysis and Recognition, 1997, pp. 1011-
1016.
[9] Tao Hong, "Degraded text recognition using visual and linguistic
context", a dissertation submitted to the faculty of the graduate school
of the State University of New York at Buffalo, 1995.