Component-based Segmentation of Words from Handwritten Arabic Text

Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition.




References:
[1] A. Amin. "Offline Arabic character recognition: The state of the art".
Pattern Recognition, vol. 3, pp. 517-530, 1998.
[2] L. M. Lorigo and V. Govindaraju, "Offline Arabic handwriting
recognition: a survey", IEEE Trans. on Pattern Analysis and Machine
Intelligence, vol. 28, pp. 712-724, 2006.
[3] M.S. Khorsheed," Off-Line Arabic Character Recognition - A Review",
Pattern Analysis & Applications, vol.5, pp. 31-45, 2002.
[4] H. Al-Muallim and S Yamaguchi. "A method of recognition of Arabic
cursive handwriting". IEEE Trans. on Pattern Analysis and Machine
Intelligence, vol. 9, pp. 715-722, 1987.
[5] A. Amin and H. Alsadoun, "A new segmentation technique of Arabic
text.", IEEE Trans. Pattern Recognition, Vol.2, pp. 441-445, 1992.
[6] A. Amin and H. Alsadoun, "Hand printed Arabic Character Recognition
System", IEEE Trans. Pattern Recognition, Vol. 2, pp536-539, 1994.
[7] I. S. I. Abuhaiba and P. Ahmed, "Restoration of temporal information in
off-line arabic handwriting," Pattern Recognition, vol. 26, pp. 1009-
1017, 1993.
[8] I. S. I. Abuhaiba, M. J. J. Holt, and S. Datta, "Processing of binary
images of handwritten text documents," Pattern Recognition, vol. 29, pp.
1161-1177, 1996.
[9] I. S. I. Abuhaiba, M. J. J. Holt, and S. Datta, "Recognition of Off-Line
Cursive Handwriting," Computer Vision and Image Understanding, vol.
71, pp. 19-38, 1998.
[10] M. Khorsheed, "Recognising handwritten Arabic manuscripts using a
single hidden Markov model", Pattern Recognition Letters, vol. 24, pp.
2235-2242, 2003.
[11] S. Alma-adeed, C. Higgens, and D. Elliman, "Off-line recognition of
handwritten Arabic words using multiple hidden Markov models",
Knowledge-Based Systems, vol. 17, pp. 75-79, 2004.
[12] F. Farooq, V. Govindaraju, and M. Perrone, "Pre-processing Methods
for Handwritten Arabic Documents", proc. Int-l conf. Document
Analysis and Recognition, vol. 1, pp. 267-271, 2005.
[13] IFN/ENIT - Database of Arabic Handwritten words, Institute of
Communications Technology, Technical University Braunschweig,
Germany.
[14] M. Pechwitz, and V. Margner. "Baseline Estimation for Arabic
Handwritten Words". International Workshop on Frontiers in
Handwriting Recognition, pages 479-484, 2002.
[15] H. Al-Rashaideh, "Preprocessing phase for Arabic Word Handwritten
Recognition", Information Transmissions in Computer Networks, vol.6,
pp. 11-19, 2006.
[16] M.Syiam, T.M. Nazmy, A.E. Fahmy, H. Fathi, and H. Ali, "Histogram
Clustering and Hybrid Classifier for Handwritten Arabic Characters
Recognition", Proc. IASTED Int. Multi-conf. Signal Proc., Pattern
Recognition and Applications, pp 44-49, 2006.
[17] B. Al_Badr, and R. Haralick, "Segmentation-Free Word Recognition
with Application to Arabic", proc. Int-l conf. Document Analysis and
Recognition, vol. 1, pp. 355-359, 1995..
[18] D. Motawa, A.Amin, and R. Sabourin, "Segmentation of Arabic Cursive
Script", In Proceeding of the 4th International conference Document
Analysis and Recognition, vol. 2, pp. 625-628, 1997.
[19] L. Lorigo and V. Govindaraju, "Segmentation and pre-recognition of
Arabic handwriting," proc. Int-l conf. Document Analysis and
Recognition, vol. 2, pp. 605-609, 2005.
[20] J. AlKhateeb, J. Ren, S. S. Ipson and J. Jiang: "Knowledge-based
Baseline Detection and Optimal Thresholding for Words Segmentation
in Efficient Pre-processing of Handwritten Arabic Text". International
Conference on Information Technology: New Generations, pp.1158-
1159, 2008.