Segmentation Problems and Solutions in Printed Degraded Gurmukhi Script

Character segmentation is an important preprocessing step for text recognition. In degraded documents, existence of touching characters decreases recognition rate drastically, for any optical character recognition (OCR) system. In this paper we have proposed a complete solution for segmenting touching characters in all the three zones of printed Gurmukhi script. A study of touching Gurmukhi characters is carried out and these characters have been divided into various categories after a careful analysis. Structural properties of the Gurmukhi characters are used for defining the categories. New algorithms have been proposed to segment the touching characters in middle zone, upper zone and lower zone. These algorithms have shown a reasonable improvement in segmenting the touching characters in degraded printed Gurmukhi script. The algorithms proposed in this paper are applicable only to machine printed text. We have also discussed a new and useful technique to segment the horizontally overlapping lines.




References:
[1] Y. Lu, "Machine Printed Character Segmentation - an Overview",
Pattern Recognition, vol. 29, no. 1, pp. 67-80, 1995
[2] S.Kahan, T.Pavlidis, and H.S.Baird, " on the recognition of printed
characters of any fonts and sizes", IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 9, no. 2, pp. 274-288, Mar. 1987
[3] S. Tsujimoto and H. Asada, " Resolving Ambiguity in Segmenting
Touching Characters" Ist Int. Conf. on Document Analysis and
Recognition ,pp. 701-709, Saint-Malo, France, Oct 1991.
[4] R.G.Casey and G. Nagy, "Recursive Segmentation and Classification of
Composite character Patterns", Proc. 6th Int. Conf. on Pattern
Recognition, pp. 1023-1026, Munich, germany,1982.
[5] Tao Hong, "Degraded text recognition using visual and linguistic
context", a dissertation submitted to the faculty of the graduate school
of the State University of New York at Buffalo, 1995.
[6] Veena Bansal and R.M.K. Sinha , "Segmentation of touching and Fused
Devanagari characters, ", Pattern recognition, vol. 35, pp. 875-893,
2002.
[7] U. Garain, B.B. Chaudhuri, "Segmentation of touching characters in
printed Devanagari and Bangla scripts using fuzzy multifactorial
analysis", IEEE Trans. Systems Man Cybern. Part C-32 (2002) 449-459.
[8] B.B. Chaudhuri ,U. Pal and M. Mitra , "Automatic Recognition of
Printed Oriya Script", ICDAR, pp.795-799,2001.
[9] U. Garain, B.B. Chaudhuri, "On recognition of touching characters in
printed Bangla Documents", Proceedings of the Fourth International
Conference on Document Analysis and Recognition, 1997, pp. 1011-
1016.
[10] M. K. Jindal, G.S. Lehal and R.K. Sharma," A Study of Touching
Characters in degraded Gurmukhi Script", in Int. Conf. on Pattern
Recognition and Computer Vision, PRCV 2005, pp. ?, 25-27 February
2005, Istanbul, Turkey
[11] G. S .Lehal and Chandan Singh, "Text segmentation of machine printed
Gurmukhi script", Document Recognition and Retrieval VIII,
Proceedings SPIE, USA, vol. 4307, pp. 223-231, 2001.
[12] G. S. Lehal and Chandan Singh, "A technique for segmentation of
Gurmukhi script", Computer Analysis of Images and Patterns,
Proceedings CAIP 2001, W. Skarbek (Ed.), Lecture Notes in Computer
Science, vol. 2124, Springer-Verlag, Germany, pp. 191-200, 2001.
[13] Serban, Rajjan and Raymund, "Proposed Heuristic Procedures to
Preprocesses Character Pattern using Line Adjacency Graphs", Pattern
recognition, vol. 29, no. 6, pp. 951-975, 1996.
[14] B. B. Chaudhuri and U. Pal, "Skew Angle Detection of Digitized Indian
Scripts Documents", Pattern recognition, vol. 19, no. 2, pp. 182-186,
1997.