Abstract: Optical character recognition of cursive scripts
presents a number of challenging problems in both segmentation and
recognition processes in different languages, including Persian. In
order to overcome these problems, we use a newly developed Persian
word segmentation method and a recognition-based segmentation
technique to overcome its segmentation problems. This method is
robust as well as flexible. It also increases the system-s tolerances to
font variations. The implementation results of this method on a
comprehensive database show a high degree of accuracy which meets
the requirements for commercial use. Extended with a suitable pre
and post-processing, the method offers a simple and fast framework
to develop a full OCR system.
Abstract: This paper discusses the Urdu script characteristics,
Urdu Nastaleeq and a simple but a novel and robust technique to
recognize the printed Urdu script without a lexicon. Urdu being a
family of Arabic script is cursive and complex script in its nature, the
main complexity of Urdu compound/connected text is not its
connections but the forms/shapes the characters change when it is
placed at initial, middle or at the end of a word. The characters
recognition technique presented here is using the inherited
complexity of Urdu script to solve the problem. A word is scanned
and analyzed for the level of its complexity, the point where the level
of complexity changes is marked for a character, segmented and
feeded to Neural Networks. A prototype of the system has been
tested on Urdu text and currently achieves 93.4% accuracy on the
average.