Recognition-based Segmentation in Persian Character Recognition

Optical character recognition of cursive scripts presents a number of challenging problems in both segmentation and recognition processes in different languages, including Persian. In order to overcome these problems, we use a newly developed Persian word segmentation method and a recognition-based segmentation technique to overcome its segmentation problems. This method is robust as well as flexible. It also increases the system-s tolerances to font variations. The implementation results of this method on a comprehensive database show a high degree of accuracy which meets the requirements for commercial use. Extended with a suitable pre and post-processing, the method offers a simple and fast framework to develop a full OCR system.

Urdu Nastaleeq Optical Character Recognition

This paper discusses the Urdu script characteristics, Urdu Nastaleeq and a simple but a novel and robust technique to recognize the printed Urdu script without a lexicon. Urdu being a family of Arabic script is cursive and complex script in its nature, the main complexity of Urdu compound/connected text is not its connections but the forms/shapes the characters change when it is placed at initial, middle or at the end of a word. The characters recognition technique presented here is using the inherited complexity of Urdu script to solve the problem. A word is scanned and analyzed for the level of its complexity, the point where the level of complexity changes is marked for a character, segmented and feeded to Neural Networks. A prototype of the system has been tested on Urdu text and currently achieves 93.4% accuracy on the average.