Abstract: This paper deals with an Optical Character Recognition
system for printed Urdu, a popular Pakistani/Indian script and is the
third largest understandable language in the world, especially in the
subcontinent but fewer efforts are made to make it understandable to
computers. Lot of work has been done in the field of literature and
Islamic studies in Urdu, which has to be computerized. In the
proposed system individual characters are recognized using our own
proposed method/ algorithms. The feature detection methods are
simple and robust. Supervised learning is used to train the feed
forward neural network. A prototype of the system has been tested on
printed Urdu characters and currently achieves 98.3% character level
accuracy on average .Although the system is script/ language
independent but we have designed it for Urdu characters only.
Abstract: This paper discusses the Urdu script characteristics,
Urdu Nastaleeq and a simple but a novel and robust technique to
recognize the printed Urdu script without a lexicon. Urdu being a
family of Arabic script is cursive and complex script in its nature, the
main complexity of Urdu compound/connected text is not its
connections but the forms/shapes the characters change when it is
placed at initial, middle or at the end of a word. The characters
recognition technique presented here is using the inherited
complexity of Urdu script to solve the problem. A word is scanned
and analyzed for the level of its complexity, the point where the level
of complexity changes is marked for a character, segmented and
feeded to Neural Networks. A prototype of the system has been
tested on Urdu text and currently achieves 93.4% accuracy on the
average.