Abstract: Chinese Idioms are a type of traditional Chinese idiomatic
expressions with specific meanings and stereotypes structure
which are widely used in classical Chinese and are still common in
vernacular written and spoken Chinese today. Currently, Chinese
Idioms are retrieved in glossary with key character or key word in
morphology or pronunciation index that can not meet the need of
searching semantically. OCIRS is proposed to search the desired
idiom in the case of users only knowing its meaning without any key
character or key word. The user-s request in a sentence or phrase will
be grammatically analyzed in advance by word segmentation, key
word extraction and semantic similarity computation, thus can be
mapped to the idiom domain ontology which is constructed to provide
ample semantic relations and to facilitate description logics-based
reasoning for idiom retrieval. The experimental evaluation shows that
OCIRS realizes the function of searching idioms via semantics, obtaining
preliminary achievement as requested by the users.
Abstract: In this paper we propose a segmentation system for unconstrained Arabic online handwriting. An essential problem addressed by analytical-based word recognition system. The system is composed of two-stages the first is a newly special designed hidden Markov model (HMM) and the second is a rules based stage. In our system, handwritten words are broken up into characters by simultaneous segmentation-recognition using HMMs of unique design trained using online features most of which are novel. The HMM output characters boundaries represent the proposed segmentation points (PSP) which are then validated by rules-based post stage without any contextual information help to solve different segmentation errors. The HMM has been designed and tested using a self collected dataset (OHASD) [1]. Most errors cases are cured and remarkable segmentation enhancement is achieved. Very promising word and character segmentation rates are obtained regarding the unconstrained Arabic handwriting difficulty and not using context help.
Abstract: Along with the advances in medicine, providing medical information to individual patient is becoming more important. In Japan such information via Braille is hardly provided to blind and partially sighted people. Thus we are researching and developing a Web-based automatic translation program “eBraille" to translate Japanese text into Japanese Braille. First we analyzed the Japanese transcription rules to implement them on our program. We then added medical words to the dictionary of the program to improve its translation accuracy for medical text. Finally we examined the efficacy of statistical learning models (SLMs) for further increase of word segmentation accuracy in braille translation. As a result, eBraille had the highest translation accuracy in the comparison with other translation programs, improved the accuracy for medical text and is utilized to make hospital brochures in braille for outpatients and inpatients.
Abstract: Optical character recognition of cursive scripts
presents a number of challenging problems in both segmentation and
recognition processes in different languages, including Persian. In
order to overcome these problems, we use a newly developed Persian
word segmentation method and a recognition-based segmentation
technique to overcome its segmentation problems. This method is
robust as well as flexible. It also increases the system-s tolerances to
font variations. The implementation results of this method on a
comprehensive database show a high degree of accuracy which meets
the requirements for commercial use. Extended with a suitable pre
and post-processing, the method offers a simple and fast framework
to develop a full OCR system.
Abstract: Efficient preprocessing is very essential for automatic
recognition of handwritten documents. In this paper, techniques on
segmenting words in handwritten Arabic text are presented. Firstly,
connected components (ccs) are extracted, and distances among
different components are analyzed. The statistical distribution of this
distance is then obtained to determine an optimal threshold for words
segmentation. Meanwhile, an improved projection based method is
also employed for baseline detection. The proposed method has been
successfully tested on IFN/ENIT database consisting of 26459
Arabic words handwritten by 411 different writers, and the results
were promising and very encouraging in more accurate detection of
the baseline and segmentation of words for further recognition.