Abstract: This work presents a new phonetic transcription system based on a tree of hierarchical pronunciation rules expressed as context-specific grapheme-phoneme correspondences. The tree is automatically inferred from a phonetic dictionary by incrementally analyzing deeper context levels, eventually representing a minimum set of exhaustive rules that pronounce without errors all the words in the training dictionary and that can be applied to out-of-vocabulary words. The proposed approach improves upon existing rule-tree-based techniques in that it makes use of graphemes, rather than letters, as elementary orthographic units. A new linear algorithm for the segmentation of a word in graphemes is introduced to enable outof- vocabulary grapheme-based phonetic transcription. Exhaustive rule trees provide a canonical representation of the pronunciation rules of a language that can be used not only to pronounce out-of-vocabulary words, but also to analyze and compare the pronunciation rules inferred from different dictionaries. The proposed approach has been implemented in C and tested on Oxford British English and Basic English. Experimental results show that grapheme-based rule trees represent phonetically sound rules and provide better performance than letter-based rule trees.
Abstract: The evaluation of non-conventional water resources
on seed germination and seedling growth performance at early
growth stages is still in progress especially in forage crops. This
study was designed to test the effect of four types of water qualities
(treated wastewater (TWW), industrial water (IW), grey water (GW),
and Distilled water (DW)) on germination and early seedling vigor of
Leucaena leucocephala. The results showed that the germination
was not significantly affected by the different water qualities. Seed
germination reached maximum after 17, 14, 14, and 21 days under
GW, IW, TWW, and DW treatments, respectively. The highest mean
of shoot length was scored under the GW treatment. And, the highest
mean of root length was scored under DW which was not significant
from GW treatment. The means of shoot fresh was the highest under
the TWW. The means of root fresh weight was not significantly
different from each other's under different treatments. The growth
performance was in progress with no mortality during 21 days of
growth. Thus, the best non-conventional water qualities alternatives
based on the cleanness, nutrients, and toxicity are the GW, TWW and
IW, respectively.
Abstract: Multiple sequence alignment is a fundamental part in
many bioinformatics applications such as phylogenetic analysis.
Many alignment methods have been proposed. Each method gives a
different result for the same data set, and consequently generates a
different phylogenetic tree. Hence, the chosen alignment method
affects the resulting tree. However in the literature, there is no
evaluation of multiple alignment methods based on the comparison of
their phylogenetic trees. This work evaluates the following eight
aligners: ClustalX, T-Coffee, SAGA, MUSCLE, MAFFT, DIALIGN,
ProbCons and Align-m, based on their phylogenetic trees (test trees)
produced on a given data set. The Neighbor-Joining method is used
to estimate trees. Three criteria, namely, the dNNI, the dRF and the
Id_Tree are established to test the ability of different alignment
methods to produce closer test tree compared to the reference one
(true tree). Results show that the method which produces the most
accurate alignment gives the nearest test tree to the reference tree.
MUSCLE outperforms all aligners with respect to the three criteria
and for all datasets, performing particularly better when sequence
identities are within 10-20%. It is followed by T-Coffee at lower
sequence identity (30%), trees scores of all methods
become similar.
Abstract: Data Mining aims at discovering knowledge out of
data and presenting it in a form that is easily comprehensible to
humans. One of the useful applications in Egypt is the Cancer
management, especially the management of Acute Lymphoblastic
Leukemia or ALL, which is the most common type of cancer in
children.
This paper discusses the process of designing a prototype that can
help in the management of childhood ALL, which has a great
significance in the health care field. Besides, it has a social impact
on decreasing the rate of infection in children in Egypt. It also
provides valubale information about the distribution and
segmentation of ALL in Egypt, which may be linked to the possible
risk factors.
Undirected Knowledge Discovery is used since, in the case of this
research project, there is no target field as the data provided is
mainly subjective. This is done in order to quantify the subjective
variables. Therefore, the computer will be asked to identify
significant patterns in the provided medical data about ALL. This
may be achieved through collecting the data necessary for the
system, determimng the data mining technique to be used for the
system, and choosing the most suitable implementation tool for the
domain.
The research makes use of a data mining tool, Clementine, so as to
apply Decision Trees technique. We feed it with data extracted from
real-life cases taken from specialized Cancer Institutes. Relevant
medical cases details such as patient medical history and diagnosis
are analyzed, classified, and clustered in order to improve the disease
management.
Abstract: Leo Breimans Random Forests (RF) is a recent
development in tree based classifiers and quickly proven to be one of
the most important algorithms in the machine learning literature. It
has shown robust and improved results of classifications on standard
data sets. Ensemble learning algorithms such as AdaBoost and
Bagging have been in active research and shown improvements in
classification results for several benchmarking data sets with mainly
decision trees as their base classifiers. In this paper we experiment to
apply these Meta learning techniques to the random forests. We
experiment the working of the ensembles of random forests on the
standard data sets available in UCI data sets. We compare the
original random forest algorithm with their ensemble counterparts
and discuss the results.
Abstract: Timing driven physical design, synthesis, and
optimization tools need efficient closed-form delay models for
estimating the delay associated with each net in an integrated circuit
(IC) design. The total number of nets in a modern IC design has
increased dramatically and exceeded millions. Therefore efficient
modeling of interconnection is needed for high speed IC-s. This
paper presents closed–form expressions for RC and RLC
interconnection trees in current mode signaling, which can be
implemented in VLSI design tool. These analytical model
expressions can be used for accurate calculation of delay after the
design clock tree has been laid out and the design is fully routed.
Evaluation of these analytical models is several orders of magnitude
faster than simulation using SPICE.
Abstract: This paper presents an effective framework for Chinesesyntactic parsing, which includes two parts. The first one is a parsing framework, which is based on an improved bottom-up chart parsingalgorithm, and integrates the idea of the beam search strategy of N bestalgorithm and heuristic function of A* algorithm for pruning, then get multiple parsing trees. The second is a novel evaluation model, which integrates contextual and partial lexical information into traditional PCFG model and defines a new score function. Using this model, the tree with the highest score is found out as the best parsing tree. Finally,the contrasting experiment results are given. Keywords?syntactic parsing, PCFG, pruning, evaluation model.
Abstract: In this paper, we use nonlinear system identification method to predict and detect process fault of a cement rotary kiln. After selecting proper inputs and output, an input-output model is identified for the plant. To identify the various operation points in the
kiln, Locally Linear Neuro-Fuzzy (LLNF) model is used. This model is trained by LOLIMOT algorithm which is an incremental treestructure
algorithm. Then, by using this method, we obtained 3
distinct models for the normal and faulty situations in the kiln. One of the models is for normal condition of the kiln with 15 minutes
prediction horizon. The other two models are for the two faulty situations in the kiln with 7 minutes prediction horizon are presented.
At the end, we detect these faults in validation data. The data collected from White Saveh Cement Company is used for in this study.
Abstract: The belief decision tree (BDT) approach is a decision
tree in an uncertain environment where the uncertainty is represented
through the Transferable Belief Model (TBM), one interpretation
of the belief function theory. The uncertainty can appear either in
the actual class of training objects or attribute values of objects to
classify. In this paper, we develop a post-pruning method of belief
decision trees in order to reduce size and improve classification
accuracy on unseen cases. The pruning of decision tree has a
considerable intention in the areas of machine learning.
Abstract: Random Forests are a powerful classification technique, consisting of a collection of decision trees. One useful feature of Random Forests is the ability to determine the importance of each variable in predicting the outcome. This is done by permuting each variable and computing the change in prediction accuracy before and after the permutation. This variable importance calculation is similar to a one-factor-at a time experiment and therefore is inefficient. In this paper, we use a regular fractional factorial design to determine which variables to permute. Based on the results of the trials in the experiment, we calculate the individual importance of the variables, with improved precision over the standard method. The method is illustrated with a study of student attrition at Monash University.
Abstract: Electrocardiogram (ECG) data compression algorithm
is needed that will reduce the amount of data to be transmitted, stored
and analyzed, but without losing the clinical information content. A
wavelet ECG data codec based on the Set Partitioning In Hierarchical
Trees (SPIHT) compression algorithm is proposed in this paper. The
SPIHT algorithm has achieved notable success in still image coding.
We modified the algorithm for the one-dimensional (1-D) case and
applied it to compression of ECG data.
By this compression method, small percent root mean square
difference (PRD) and high compression ratio with low
implementation complexity are achieved. Experiments on selected
records from the MIT-BIH arrhythmia database revealed that the
proposed codec is significantly more efficient in compression and in
computation than previously proposed ECG compression schemes.
Compression ratios of up to 48:1 for ECG signals lead to acceptable
results for visual inspection.
Abstract: Col is a classic combinatorial game played on graphs
and to solve a general instance is a PSPACE-complete problem.
However, winning strategies can be found for some specific graph
instances. In this paper, the solution of Col on complete k-ary trees
is presented.
Abstract: In the area where the high quality water is not
available, unconventional water sources are used to irrigate.
Household leachate is one of the sources which are used in dry and
semi dry areas in order to water the barer trees and plants. It meets
the plants needs and also has some effects on the soil, but at the same
time it might cause some problems as well. This study in order to
evaluate the effect of using Compost leachate on the density of soil
iron in form of a statistical pattern called ''Split Plot'' by using two
main treatments, one subsidiary treatment and three repetitions of the
pattern in a three month period. The main N treatments include:
irrigation using well water as a blank treatments and the main I
treatments include: irrigation using leachate and well water
concurrently. Some subsidiary treatments were DI (Drop Irrigation)
and SDI (Sub Drop Irrigation). Then in the established plots, 36
biannual pine and cypress shrubs were randomly grown. Two months
later the treatment begins. The results revealed that there was a
significant variation between the main treatment and the instance
regarding pH decline in the soil which was related to the amount of
leachate injected into the soil. After some time and using leachate the
pH level fell, as much as 0.46 and also increased due to the great
amounts of leachate. The underneath drop irrigation ends in better
results than sub drop irrigation since it keeps the soil texture fixed.
Abstract: In Algeria, some fruit trees produce fruits in free nature. Such trees are Celtis australis, Crataegus azarolus, Crataegus monogyna and Zizyphus lotus. In spite of their appreciable consumption, their nutritional value remains unknown. The objective of this study is the determination of sugars in the pulpe and almond of the above fruits. The biochemical analysis shows that these fruits present interesting contents of soluble sugars which confers significant caloric intakes to them. As well as significant fibres which give them therapeutic and industrial benefits? The analysis of the almonds shows that it contains considerable contents of sugars which enable them to be an energetic food.
Abstract: Power loss reduction is one of the main targets in power industry and so in this paper, the problem of finding the optimal configuration of a radial distribution system for loss reduction is considered. Optimal reconfiguration involves the selection of the best set of branches to be opened ,one each from each loop, for reducing resistive line losses , and reliving overloads on feeders by shifting the load to adjacent feeders. However ,since there are many candidate switching combinations in the system ,the feeder reconfiguration is a complicated problem. In this paper a new approach is proposed based on a simple optimum loss calculation by determining optimal trees of the given network. From graph theory a distribution network can be represented with a graph that consists a set of nodes and branches. In fact this problem can be viewed as a problem of determining an optimal tree of the graph which simultaneously ensure radial structure of each candidate topology .In this method the refined genetic algorithm is also set up and some improvements of algorithm are made on chromosome coding. In this paper an implementation of the algorithm presented by [7] is applied by modifying in load flow program and a comparison of this method with the proposed method is employed. In [7] an algorithm is proposed that the choice of the switches to be opened is based on simple heuristic rules. This algorithm reduce the number of load flow runs and also reduce the switching combinations to a fewer number and gives the optimum solution. To demonstrate the validity of these methods computer simulations with PSAT and MATLAB programs are carried out on 33-bus test system. The results show that the performance of the proposed method is better than [7] method and also other methods.
Abstract: Landscape connectivity combines a description of the
physical structure of the landscape with special species- response to
that structure, which forms the theoretical background of applying
landscape connectivity principles in the practices of landscape
planning and design. In this study, a residential development project in
the southern United States was used to explore the meaning of
landscape connectivity and its application in town planning. The vast
rural landscape in the southern United States is conspicuously
characterized by the hedgerow trees or groves. The patchwork
landscape of fields surrounded by high hedgerows is a traditional and
familiar feature of the American countryside. Hedgerows are in effect
linear strips of trees, groves, or woodlands, which are often critical
habitats for wildlife and important for the visual quality of the
landscape. Based on geographic information system (GIS) and
statistical analysis (FRAGSTAT), this study attempts to quantify the
landscape connectivity characterized by hedgerows in south Alabama
where substantial areas of authentic hedgerow landscape are being
urbanized due to the ever expanding real estate industry and high
demand for new residential development. The results of this study
shed lights on how to balance the needs of new urban development and
biodiversity conservation by maintaining a higher level of landscape
connectivity, thus will inform the design intervention.
Abstract: XML has become a popular standard for information exchange via web. Each XML document can be presented as a rooted, ordered, labeled tree. The Node label shows the exact position of a node in the original document. Region and Dewey encoding are two famous methods of labeling trees. In this paper, we propose a new insert friendly labeling method named IFDewey based on recently proposed scheme, called Extended Dewey. In Extended Dewey many labels must be modified when a new node is inserted into the XML tree. Our method eliminates this problem by reserving even numbers for future insertion. Numbers generated by Extended Dewey may be even or odd. IFDewey modifies Extended Dewey so that only odd numbers are generated and even numbers can then be used for a much easier insertion of nodes.
Abstract: The aim of this paper is to identify the most suitable
model for churn prediction based on three different techniques. The
paper identifies the variables that affect churn in reverence of
customer complaints data and provides a comparative analysis of
neural networks, regression trees and regression in their capabilities
of predicting customer churn.
Abstract: In this paper, we present a new learning algorithm for
anomaly based network intrusion detection using improved self
adaptive naïve Bayesian tree (NBTree), which induces a hybrid of
decision tree and naïve Bayesian classifier. The proposed approach
scales up the balance detections for different attack types and keeps
the false positives at acceptable level in intrusion detection. In
complex and dynamic large intrusion detection dataset, the detection
accuracy of naïve Bayesian classifier does not scale up as well as
decision tree. It has been successfully tested in other problem
domains that naïve Bayesian tree improves the classification rates in
large dataset. In naïve Bayesian tree nodes contain and split as
regular decision-trees, but the leaves contain naïve Bayesian
classifiers. The experimental results on KDD99 benchmark network
intrusion detection dataset demonstrate that this new approach scales
up the detection rates for different attack types and reduces false
positives in network intrusion detection.
Abstract: Recently, the issue of machine condition monitoring
and fault diagnosis as a part of maintenance system became global
due to the potential advantages to be gained from reduced
maintenance costs, improved productivity and increased machine
availability. The aim of this work is to investigate the effectiveness
of a new fault diagnosis method based on power spectral density
(PSD) of vibration signals in combination with decision trees and
fuzzy inference system (FIS). To this end, a series of studies was
conducted on an external gear hydraulic pump. After a test under
normal condition, a number of different machine defect conditions
were introduced for three working levels of pump speed (1000, 1500,
and 2000 rpm), corresponding to (i) Journal-bearing with inner face
wear (BIFW), (ii) Gear with tooth face wear (GTFW), and (iii)
Journal-bearing with inner face wear plus Gear with tooth face wear
(B&GW). The features of PSD values of vibration signal were
extracted using descriptive statistical parameters. J48 algorithm is
used as a feature selection procedure to select pertinent features from
data set. The output of J48 algorithm was employed to produce the
crisp if-then rule and membership function sets. The structure of FIS
classifier was then defined based on the crisp sets. In order to
evaluate the proposed PSD-J48-FIS model, the data sets obtained
from vibration signals of the pump were used. Results showed that
the total classification accuracy for 1000, 1500, and 2000 rpm
conditions were 96.42%, 100%, and 96.42% respectively. The results
indicate that the combined PSD-J48-FIS model has the potential for
fault diagnosis of hydraulic pumps.