A Framework for Urdu Language Translation using LESSA

Internet is one of the major sources of information for the person belonging to almost all the fields of life. Major language that is used to publish information on internet is language. This thing becomes a problem in a country like Pakistan, where Urdu is the national language. Only 10% of Pakistan mass can understand English. The reason is millions of people are deprived of precious information available on internet. This paper presents a system for translation from English to Urdu. A module LESSA is used that uses a rule based algorithm to read the input text in English language, understand it and translate it into Urdu language. The designed approach was further incorporated to translate the complete website from English language o Urdu language. An option appears in the browser to translate the webpage in a new window. The designed system will help the millions of users of internet to get benefit of the internet and approach the latest information and knowledge posted daily on internet.

A Novel Arabic Text Steganography Method Using Letter Points and Extensions

This paper presents a new steganography approach suitable for Arabic texts. It can be classified under steganography feature coding methods. The approach hides secret information bits within the letters benefiting from their inherited points. To note the specific letters holding secret bits, the scheme considers the two features, the existence of the points in the letters and the redundant Arabic extension character. We use the pointed letters with extension to hold the secret bit 'one' and the un-pointed letters with extension to hold 'zero'. This steganography technique is found attractive to other languages having similar texts to Arabic such as Persian and Urdu.

Urdu Nastaleeq Optical Character Recognition

This paper discusses the Urdu script characteristics, Urdu Nastaleeq and a simple but a novel and robust technique to recognize the printed Urdu script without a lexicon. Urdu being a family of Arabic script is cursive and complex script in its nature, the main complexity of Urdu compound/connected text is not its connections but the forms/shapes the characters change when it is placed at initial, middle or at the end of a word. The characters recognition technique presented here is using the inherited complexity of Urdu script to solve the problem. A word is scanned and analyzed for the level of its complexity, the point where the level of complexity changes is marked for a character, segmented and feeded to Neural Networks. A prototype of the system has been tested on Urdu text and currently achieves 93.4% accuracy on the average.