Abstract: In this paper, an intelligent algorithm for optimal
document archiving is presented. It is kown that electronic archives
are very important for information system management. Minimizing
the size of the stored data in electronic archive is a main issue to
reduce the physical storage area. Here, the effect of different types of
Arabic fonts on electronic archives size is discussed. Simulation
results show that PDF is the best file format for storage of the Arabic
documents in electronic archive. Furthermore, fast information
detection in a given PDF file is introduced. Such approach uses fast
neural networks (FNNs) implemented in the frequency domain. The
operation of these networks relies on performing cross correlation in
the frequency domain rather than spatial one. It is proved
mathematically and practically that the number of computation steps
required for the presented FNNs is less than that needed by
conventional neural networks (CNNs). Simulation results using
MATLAB confirm the theoretical computations.
Abstract: In this paper, an Arabic letter recognition system based on Artificial Neural Networks (ANNs) and statistical analysis for feature extraction is presented. The ANN is trained using the Least Mean Squares (LMS) algorithm. In the proposed system, each typed Arabic letter is represented by a matrix of binary numbers that are used as input to a simple feature extraction system whose output, in addition to the input matrix, are fed to an ANN. Simulation results are provided and show that the proposed system always produces a lower Mean Squared Error (MSE) and higher success rates than the current ANN solutions.
Abstract: Field Association (FA) terms are a limited set of discriminating terms that give us the knowledge to identify document fields which are effective in document classification, similar file retrieval and passage retrieval. But the problem lies in the lack of an effective method to extract automatically relevant Arabic FA Terms to build a comprehensive dictionary. Moreover, all previous studies are based on FA terms in English and Japanese, and the extension of FA terms to other language such Arabic could be definitely strengthen further researches. This paper presents a new method to extract, Arabic FA Terms from domain-specific corpora using part-of-speech (POS) pattern rules and corpora comparison. Experimental evaluation is carried out for 14 different fields using 251 MB of domain-specific corpora obtained from Arabic Wikipedia dumps and Alhyah news selected average of 2,825 FA Terms (single and compound) per field. From the experimental results, recall and precision are 84% and 79% respectively. Therefore, this method selects higher number of relevant Arabic FA Terms at high precision and recall.
Abstract: In this paper we present the first Arabic sentence
dataset for on-line handwriting recognition written on tablet pc. The
dataset is natural, simple and clear. Texts are sampled from daily
newspapers. To collect naturally written handwriting, forms are
dictated to writers. The current version of our dataset includes 154
paragraphs written by 48 writers. It contains more than 3800 words
and more than 19,400 characters. Handwritten texts are mainly
written by researchers from different research centers. In order to use
this dataset in a recognition system word extraction is needed. In this
paper a new word extraction technique based on the Arabic
handwriting cursive nature is also presented. The technique is applied
to this dataset and good results are obtained. The results can be
considered as a bench mark for future research to be compared with.
Abstract: The electronically available Urdu data is in image form
which is very difficult to process. Printed Urdu data is the root cause
of problem. So for the rapid progress of Urdu language we need an
OCR systems, which can help us to make Urdu data available for the
common person. Research has been carried out for years to automata
Arabic and Urdu script. But the biggest hurdle in the development of
Urdu OCR is the challenge to recognize Nastalique Script which is
taken as standard for writing Urdu language. Nastalique script is
written diagonally with no fixed baseline which makes the script
somewhat complex. Overlap is present not only in characters but in
the ligatures as well. This paper proposes a method which allows
successful recognition of Nastalique Script.
Abstract: This paper is concerned with the production of an Arabic word semantic similarity benchmark dataset. It is the first of its kind for Arabic which was particularly developed to assess the accuracy of word semantic similarity measurements. Semantic similarity is an essential component to numerous applications in fields such as natural language processing, artificial intelligence, linguistics, and psychology. Most of the reported work has been done for English. To the best of our knowledge, there is no word similarity measure developed specifically for Arabic. In this paper, an Arabic benchmark dataset of 70 word pairs is presented. New methods and best possible available techniques have been used in this study to produce the Arabic dataset. This includes selecting and creating materials, collecting human ratings from a representative sample of participants, and calculating the overall ratings. This dataset will make a substantial contribution to future work in the field of Arabic WSS and hopefully it will be considered as a reference basis from which to evaluate and compare different methodologies in the field.
Abstract: Recognition of characters greatly depends upon the features used. Several features of the handwritten Arabic characters are selected and discussed. An off-line recognition system based on the selected features was built. The system was trained and tested with realistic samples of handwritten Arabic characters. Evaluation of the importance and accuracy of the selected features is made. The recognition based on the selected features give average accuracies of 88% and 70% for the numbers and letters, respectively. Further improvements are achieved by using feature weights based on insights gained from the accuracies of individual features.
Abstract: In this study, a fuzzy similarity approach for Arabic web pages classification is presented. The approach uses a fuzzy term-category relation by manipulating membership degree for the training data and the degree value for a test web page. Six measures are used and compared in this study. These measures include: Einstein, Algebraic, Hamacher, MinMax, Special case fuzzy and Bounded Difference approaches. These measures are applied and compared using 50 different Arabic web-pages. Einstein measure was gave best performance among the other measures. An analysis of these measures and concluding remarks are drawn in this study.
Abstract: In this research, natural canthaxanthin as one of the
most important carotenoids was extracted from Dietzia
natronolimnaea HS-1. The changes of canthaxanthin enriched in oilin-
water emulsions with vegetable oil (5 mg/ 100 mL), Arabic gum (5
mg/100 mL), and potassium sorbate (0.5 g/100 mL) was investigated.
The effects of different pH (3, 5 and 7), as well as, time treatment (3,
18 and 33 days) in the environmental temperature (24°C) on the
degradation were studied by response surface methodology (RSM).
The Hunter values (L*, a*, and b*) and the concentration of
canthaxanthin (C, mg/L) illustrated more degradation of this pigment
at low pHs (pH≤ 4) by passing the time (days≥10) with R² 97.00%,
91.31%, 97.60%, and 99.54% for C, L*, a*, and b* respectively. The
predicted model were found to be significant (p
Abstract: This paper studies the effect of different compression
constraints and schemes presented in a new and flexible paradigm to
achieve high compression ratios and acceptable signal to noise ratios
of Arabic speech signals. Compression parameters are computed for
variable frame sizes of a level 5 to 7 Discrete Wavelet Transform
(DWT) representation of the signals for different analyzing mother
wavelet functions. Results are obtained and compared for Global
threshold and level dependent threshold techniques. The results
obtained also include comparisons with Signal to Noise Ratios, Peak
Signal to Noise Ratios and Normalized Root Mean Square Error.
Abstract: This paper presents an ESN-based Arabic phoneme
recognition system trained with supervised, forced and combined
supervised/forced supervised learning algorithms. Mel-Frequency
Cepstrum Coefficients (MFCCs) and Linear Predictive Code (LPC)
techniques are used and compared as the input feature extraction
technique. The system is evaluated using 6 speakers from the King
Abdulaziz Arabic Phonetics Database (KAPD) for Saudi Arabia
dialectic and 34 speakers from the Center for Spoken Language
Understanding (CSLU2002) database of speakers with different
dialectics from 12 Arabic countries. Results for the KAPD and
CSLU2002 Arabic databases show phoneme recognition
performances of 72.31% and 38.20% respectively.
Abstract: This article addresses the procedures to validate the Arabic version of Multiple Intelligence Development Assessment Scale (MIDAS). The content validity was examined based on the experts- judgments on the MIDAS-s items in the Arabic version. The content of eleven items in the Arabic version of MIDAS was modified to match the Arabic context. Then a translation from original English version of MIDAS into Arabic language was performed. The reliability of the Arabic MIDAS was calculated based on test and retest method and found to be 0.85 for the overall MIDAS and for the different subscales ranging between 0.78 - 0.87. The examination of construct validity for the overall Arabic MIDAS and its subscales was established by using Winsteps program version 6 based on Rasch model in order to fit the items into the Arabic context. The findings indicated that, the eight subscales in Arabic version of MIDAS scale have a unidimensionality, and the total number of kept items in the overall scale is 108 items.
Abstract: Documents retrieval in Information Retrieval
Systems (IRS) is generally about understanding of
information in the documents concern. The more the system
able to understand the contents of documents the more
effective will be the retrieval outcomes. But understanding of the
contents is a very complex task. Conventional IRS apply algorithms
that can only approximate the meaning of document contents through
keywords approach using vector space model. Keywords may be
unstemmed or stemmed. When keywords are stemmed and conflated
in retrieving process, we are a step forwards in applying semantic
technology in IRS. Word stemming is a process in morphological
analysis under natural language processing, before syntactic and
semantic analysis. We have developed algorithms for Malay and
Arabic and incorporated stemming in our experimental systems in
order to measure retrieval effectiveness. The results have shown that
the retrieval effectiveness has increased when stemming is used in
the systems.
Abstract: Many natural language expressions are ambiguous, and
need to draw on other sources of information to be interpreted.
Interpretation of the e word تعاون to be considered as a noun or a verb
depends on the presence of contextual cues. To interpret words we
need to be able to discriminate between different usages. This paper
proposes a hybrid of based- rules and a machine learning method for
tagging Arabic words. The particularity of Arabic word that may be
composed of stem, plus affixes and clitics, a small number of rules
dominate the performance (affixes include inflexional markers for
tense, gender and number/ clitics include some prepositions,
conjunctions and others). Tagging is closely related to the notion of
word class used in syntax. This method is based firstly on rules (that
considered the post-position, ending of a word, and patterns), and
then the anomaly are corrected by adopting a memory-based learning
method (MBL). The memory_based learning is an efficient method to
integrate various sources of information, and handling exceptional
data in natural language processing tasks. Secondly checking the
exceptional cases of rules and more information is made available to
the learner for treating those exceptional cases. To evaluate the
proposed method a number of experiments has been run, and in
order, to improve the importance of the various information in
learning.
Abstract: The purpose of this study is to investigate the effects
of modality principles in instructional software among first grade
pupils- achievements in the learning of Arabic Language. Two modes
of instructional software were systematically designed and
developed, audio with images (AI), and text with images (TI). The
quasi-experimental design was used in the study. The sample
consisted of 123 male and female pupils from IRBED Education
Directorate, Jordan. The pupils were randomly assigned to any one of
the two modes. The independent variable comprised the two modes
of the instructional software, the students- achievement levels in the
Arabic Language class and gender. The dependent variable was the
achievements of the pupils in the Arabic Language test. The
theoretical framework of this study was based on Mayer-s Cognitive
Theory of Multimedia Learning. Four hypotheses were postulated
and tested. Analyses of Variance (ANOVA) showed that pupils using
the (AI) mode performed significantly better than those using (TI)
mode. This study concluded that the audio with images mode was an
important aid to learning as compared to text with images mode.
Abstract: Despite the fact that Arabic language is currently one
of the most common languages worldwide, there has been only a
little research on Arabic speech recognition relative to other
languages such as English and Japanese. Generally, digital speech
processing and voice recognition algorithms are of special
importance for designing efficient, accurate, as well as fast automatic
speech recognition systems. However, the speech recognition process
carried out in this paper is divided into three stages as follows: firstly,
the signal is preprocessed to reduce noise effects. After that, the
signal is digitized and hearingized. Consequently, the voice activity
regions are segmented using voice activity detection (VAD)
algorithm. Secondly, features are extracted from the speech signal
using Mel-frequency cepstral coefficients (MFCC) algorithm.
Moreover, delta and acceleration (delta-delta) coefficients have been
added for the reason of improving the recognition accuracy. Finally,
each test word-s features are compared to the training database using
dynamic time warping (DTW) algorithm. Utilizing the best set up
made for all affected parameters to the aforementioned techniques,
the proposed system achieved a recognition rate of about 98.5%
which outperformed other HMM and ANN-based approaches
available in the literature.
Abstract: Transliteration is frequently used especially in writing geographic denominations, personal names (onyms) etc. Proper names (onyms) of all languages must sound similarly in translated works as well as in scientific projects and works written in mother tongue, because we can get introduced with the nation, its history, culture, traditions and other spiritual values through the onyms of that nation. Therefore it is necessary to systematize the different transliterations of onyms of foreign languages. This paper is dedicated to the problem of making the project of transliterating Kazakh onyms into Arabic. In order to achieve this goal we use scientific or practical types of transliteration. Because in this type of transliteration provides easy reading writing source language's texts in the target language without any diacritical symbols, it is limited by the target language's alphabetic system.
Abstract: An automatic speech recognition system for the
formal Arabic language is needed. The Quran is the most formal
spoken book in Arabic, it is spoken all over the world. In this
research, an automatic speech recognizer for Quranic based speakerindependent
was developed and tested. The system was developed
based on the tri-phone Hidden Markov Model and Maximum
Likelihood Linear Regression (MLLR). The MLLR computes a set
of transformations which reduces the mismatch between an initial
model set and the adaptation data. It uses the regression class tree, as
well as, estimates a set of linear transformations for the mean and
variance parameters of a Gaussian mixture HMM system. The 30th
Chapter of the Quran, with five of the most famous readers of the
Quran, was used for the training and testing of the data. The chapter
includes about 2000 distinct words. The advantages of using the
Quranic verses as the database in this developed recognizer are the
uniqueness of the words and the high level of orderliness between
verses. The level of accuracy from the tested data ranged 68 to 85%.
Abstract: Abai Kunanbayev is famous for being enlightener,
composer, interpreter, social agent, philosopher, reformer, who
wanted to enrich Kazakh literature by emergence with Russian and
European culture, and also as a founder of Kazakh written literary
language. Abai Kunanbayev was born in 1845 in East Kazakhstan
area and passed away in 1904 in his hometown. His oeuvre absorbed
and reflected all changes in the life of Kazakh society of the second
half of XIX century. Because ХІХ century, especially its second half,
was an important transition period for Kazakhstan, which radically
changed traditional way of Kazakh society and predetermined further
development in consequence of activation of Russian colonial policy
and approval of commodity-money relations in Steppe Land.Abai
Kunanbayev, besides Arabic and Persian common words and
loanwords from Quran in his words of edification, had used a lot of
words of Arabic, Persian, Latin, Russian, Nogai, Shaghatai, Polish,
Greek, Turkish, which are used in the Kazakh language.
Abstract: This article concerned with the translation of Quranic
verses to Braille symbols, by using Visual basic program. The
system has the ability to translate the special vibration for the Quran.
This study limited for the (Noun + Scoon) vibrations. It builds on an
existing translation system that combines a finite state machine with
left and right context matching and a set of translation rules. This
allows to translate the Arabic language from text to Braille symbols
after detect the vibration for the Quran verses.