Identification of Printed Punjabi Words and English Numerals Using Gabor Features

Script identification is one of the challenging steps in the development of optical character recognition system for bilingual or multilingual documents. In this paper an attempt is made for identification of English numerals at word level from Punjabi documents by using Gabor features. The support vector machine (SVM) classifier with five fold cross validation is used to classify the word images. The results obtained are quite encouraging. Average accuracy with RBF kernel, Polynomial and Linear Kernel functions comes out to be greater than 99%.





References:
[1] D Dhanya and A G Ramakrishnan, "Simultaneous Recognition of Tamil
and Roman Scripts", in the Proc. Tamil Internet, Kuala Lumpur, pp. 64-
68, 2001.
[2] Rajneesh Rani, Renu Dhir , "A Survey: Recognition of Scripts in Bi-
Lingual/Multi-Lingual Indian Documents" in national journal of PIMT
Journal of Research Vol. 2 No. 1 pp. 55-60 , March- August, 2009.
[3] S.Abirami, Dr. D. Manjula, "A Survey of Script Identification
Techniques for Multi-Script Document Images" in international journal
of Recent trends in Engineering Vol. 1 No. 2 pp. 246-249 May,2009.
[4] S.Wood, X.Yao, K.Krishnamurthi and L.Dang "language identification
from for printrd trxt independent od fsegmentation," Proc of
International conference on Image Processing, pp. 428-431,1995.
[5] J.Hochberg, P.Kelly, T Thomas and L Kerns, "Automatic script
identification from document images using cluster based templates,"
IEEE Trans. on Pattern Anaylsis and Machine Intelligence, vol 19, pp.
176-181, 1997.
[6] A.L.Spitz, "Determination of the script and language content of
document images," IEEE Transactions on pattern Anaylsis and Machine
Intelligence, Vol 19, pp.234-24,1997.
[7] T.N. Tan, "Rotation invariant textutre features and their use in automatic
script identification," IEEE Trans on Pattern Anaylsis and Machine
Intelligence, vol. 20, pp 751-756, 1998.
[8] D Dhanya, A.G Ramakrishnan and Peeta Basa pati, "Script identification
in printed bilingual documents," Sadhana, vol. 27, part-1, pp. 73-82,
2002.
[9] U.Pal. S.Sinha and B.B Chaudhuri, "Word-wise Script identification
from a document containing English ,Devnagari and Telgu Text," in the
proc. of NCDAR, pp. 213-220,2003
[10] M.C. Padma , Dr. P.A. Vijya, " Language Identification of Kannada,
Hindi and English Text Words through Visual Discriminating features",
in the international journal of Computational Intelligence Systems,
Vol.1 No.2 pp. 116-126, May -2008.
[11] Peeta Basa pati, S. Sabari Raju, Nishikanta Pati and A.G.
Ramakrishnan, "Gabor filters for document analysis in Indian Bilingual
Documents," In the Proc. Of ICISIP, pp. 123-126, 2004.
[12] Peeta Basa Pati and A.G.Ramakrishnan, "HVS inspired system for Script
Identification in Indian Multi-Script Documents", In Proc. of 7th
International Workshop on Document Analysis System, Nelson
Newland, pp. 380-389, 2006
[13] Peeta Basa Pati, A.G. Ramakrishnan " Word level multi-script
identification" in the Pattern Recognition Letters 29 pp. 1218-1219,
2008.
[14] B.V.Dhandra, H.Mallikarjun, Ravindra Hegadi, V.S.Malemath, "Wordwise
Script Identification from Bilingual Documents based on
Morphological Reconstruction," in the proc. of First IEEE International
Conference on Digital Information Management, pp. 389-394, 2006.
[15] B.V.Dhandra, H.Mallikarjun, Ravindra Hegadi, V.S.Malemath, "Word-
wise Script Identification based on Morphological Reconstruction in
Printed Bilingual Documents," in the proc. of IET International
Conference on Vision Information Engineering VIE, Bangalore pp. 389-
393, 2006
[16] B.V.Dhandra, Mallikarjun Hangarge, " On Separation of English
Numerals from Multilingual Document Images", In the journal of
multimedia , Vol 2, No 6, pp. 26-33, 2007.
[17] Renu Dhir, Chandan Singh and G.S.Lehal, "A Structural Feature Based
Approach for Script Identification of Gurmukhi and Roman Character
and Words" in the proc. of 39th Annual National Convention of
Computer Society of India (CSI) held at Mumbai, India, 2004
[18] Dharamveer Sharma, Gurpreet Singh Lehal, Preeti Kathuria ," Digit
Extraction and Recognition from Machine printed Gurmukhi
documents" in the Proc. Of International workshop on Multilingual Ocr
Article no 12, 2009.
[19] R Anjeev Kunte and R D Sudhaker Samuel, " A Bilingual machine-
Interface OCR fr Printed Kannada and English Text Employing Wavelet
Features" in the prproc of 10th International Conference on Information
Technology, pp.202-207, 2007.
[20] G.G.Rajput,S.M Mati, "Fourier Descriptor based Isolated Marathi
Handwritten Numeral Rcognition" in International Journal od Computer
Applications Vol. 3 No.4 pp.9-13,June=2010