Segmentation Free Nastalique Urdu OCR

The electronically available Urdu data is in image form which is very difficult to process. Printed Urdu data is the root cause of problem. So for the rapid progress of Urdu language we need an OCR systems, which can help us to make Urdu data available for the common person. Research has been carried out for years to automata Arabic and Urdu script. But the biggest hurdle in the development of Urdu OCR is the challenge to recognize Nastalique Script which is taken as standard for writing Urdu language. Nastalique script is written diagonally with no fixed baseline which makes the script somewhat complex. Overlap is present not only in characters but in the ligatures as well. This paper proposes a method which allows successful recognition of Nastalique Script.




References:
[1] Javed, S.T., Hussain, S. "Improving Nastalique Specific Pre-Recognition
Process for Urdu OCR", In the Proceedings of 13th IEEE International
Multitopic Conference 2009 (INMIC 2009), Islamabad, Pakistan, 2009
(URL: http://www.jinnah.edu.pk/inmic2009)
[2] Wali, A. and Hussain, S. "Context Sensitive Shape-Substitution in
Nastaliq Writing system: Analysis and Formulation," in the Proceedings
of International Joint Conferences on Computer, Information, and
Systems Sciences, and Engineering (CISSE), 2006.
[3] Hussain, S. and Durrani, N. "Urdu," in A Study on Collation of
Languages from Developing Asia, Center for Research in Urdu Language
Processing, NUCES, Pakistan, 2007.
[4] Hussain, S. and Afzal, M. "Urdu Computing Standards: UZT 1.01", in the
Proceedings of the IEEE International Multi-Topic Conference, Lahore,
Pakistan, 2001.
[5] Hussain, S. "Letter to Sound Rules for Urdu Text to Speech System", In
the Proceedings of Workshop on Computational Approaches to Arabic
Script-based Languages, COLING 2004, Geneva, Switzerland, 2004.
[6] Hussain, S. "www.LICT4D.asia/Fonts/Nafees_Nastalique," in the
Proceedings of 12th AMIC Annual Conference on E-Worlds:
Governments, Business and Civil Society, Asian Media Information
Center, Singapore, 2003.
[7] Lu, Z., Bazzi, I., Kornai, A. and Makhoul, J. "A Robust, Language-
Independent OCR System," in the 27th AIPR Workshop: Advances in
Computer Assisted Recognition, SPIE, 1999.
[8] El-Hajj, r., Likforman-Sulem, L. and Mokbel, C. "Arabic Handwriting
Recognition Using Baseline Dependant Features and Hidden Markov
Modeling," in the 8th International Conference on Document Analysis
and Recognition (ICDAR), South Korea, 2005.
[9] Shah, Z. and Saleem, F. "Ligature Based Optical Character Recognition
of Urdu, Nastaliq Font," in the Proceedings of International Multi Topic
Conference, Karachi, Pakistan, 2002.
[10] Husain, S.A. and Amin, S.H. "A Multi-tier Holistic approach for Urdu
Nastaliq Recognition," in the Proceedings of International Multi Topic
Conference, Karachi, Pakistan, 2002.
[11] Rabiner, L. and Juang, B. "Theory and Implementation of Hidden
Markov Models" in the book, "Fundamental of Speech Recognition",
chapter 6, published in 1993.
[12] Young,S., Evermann, G., Hain,T., Kershaw, D., Moore, G., Odell,
J.,Ollason, D., Povey, D., Valtchev, V., and Woodland, P. "The HTK
Book", December 1995.
[13] Khorsheed, M. S., Clocksin, W.F. "Structural Features Of Cursive Arabic
Script", in Proceeding of British Machine Vision Conference, pg.1285-
1294, 1999.
[14] Ijaz, M., Hussain, S. "Corpus Based Urdu Lexicon Development", In the
Proceedings of Conference on Language Technology (CLT07),
University of Peshawar, Pakistan, 2007.
[15] Pal, U. and Sarkar, A. "Recognition of Printed Urdu Text," in the
Proceedings of the Seventh International Conference on Document
Analysis and Recognition (ICDAR), 2003.
[16] Bojovic, M. and Savic, M. D. "Training of Hidden Markov Models for
Cursive Handwritten Word Recognition," in the Proceedings of the15th
International Conference on Pattern Recognition (ICPR) vol.1, 2000.
[17] Ahmad, Z., Orakzai, J. K. , Shamsher, I. and Adnan, A. "Urdu Nastaleeq
Optical Character Recognition," in the Proceedings of World Academy of
Science, Engineering and Technology 26, 2007.
[18] Shafait, F., Hasan, A., Keysers, D. and Breuel, T. "Layout analysis of
Urdu document images" in Proceedings of IEEE Multitopic Conference
(INMIC 06), 2006.
[19] Safabakhsh, R. and Abidi, P. "Nastaaligh Handwritten Word Recognition
Using a Continuous-Density Variable-Duration HMM", The Arabian
Journal for Science and Engineering, April 2005.
[20] Shamsher, I., Ahmad, Z., Orakzai, J. K. and Adnan, A. "OCR for Printed
Urdu Script Using Feed Forward Neural Network", in the Proceedings of
World Academy of Science, Engineering and Technology 23, 2007.
[21] Razzak,M., Hussain,A., Sher,M., and Khan,Z. "Combining Offline and
Online Preprocessing for Online Urdu Character
Recognition",Proceedings of the International MultiConference of
Engineers and Computer Scientists 2009 Vol I, IMECS 2009, March 18 -
20, 2009.
[22] Hussain, A., Anwar, F., and Sajjad, A. "Online Urdu Character
Recognition System." MVA2007 IAPR Conference on Machine Vision
Applications, 2007.