Abstract: This paper proposes a novel hybrid algorithm for feature selection based on a binary ant colony and SVM. The final subset selection is attained through the elimination of the features that produce noise or, are strictly correlated with other already selected features. Our algorithm can improve classification accuracy with a small and appropriate feature subset. Proposed algorithm is easily implemented and because of use of a simple filter in that, its computational complexity is very low. The performance of the proposed algorithm is evaluated through a real Rotary Cement kiln dataset. The results show that our algorithm outperforms existing algorithms.
Abstract: Fault-proneness of a software module is the
probability that the module contains faults. A correlation exists
between the fault-proneness of the software and the measurable
attributes of the code (i.e. the static metrics) and of the testing (i.e.
the dynamic metrics). Early detection of fault-prone software
components enables verification experts to concentrate their time and
resources on the problem areas of the software system under
development. This paper introduces Genetic Algorithm based
software fault prediction models with Object-Oriented metrics. The
contribution of this paper is that it has used Metric values of JEdit
open source software for generation of the rules for the classification
of software modules in the categories of Faulty and non faulty
modules and thereafter empirically validation is performed. The
results shows that Genetic algorithm approach can be used for
finding the fault proneness in object oriented software components.
Abstract: Until recently, researchers have developed various
tools and methodologies for effective clinical decision-making.
Among those decisions, chest pain diseases have been one of
important diagnostic issues especially in an emergency department. To
improve the ability of physicians in diagnosis, many researchers have
developed diagnosis intelligence by using machine learning and data
mining. However, most of the conventional methodologies have been
generally based on a single classifier for disease classification and
prediction, which shows moderate performance. This study utilizes an
ensemble strategy to combine multiple different classifiers to help
physicians diagnose chest pain diseases more accurately than ever.
Specifically the ensemble strategy is applied by using the integration
of decision trees, neural networks, and support vector machines. The
ensemble models are applied to real-world emergency data. This study
shows that the performance of the ensemble models is superior to each
of single classifiers.
Abstract: Purpose: To explore the use of Curvelet transform to
extract texture features of pulmonary nodules in CT image and support
vector machine to establish prediction model of small solitary
pulmonary nodules in order to promote the ratio of detection and
diagnosis of early-stage lung cancer. Methods: 2461 benign or
malignant small solitary pulmonary nodules in CT image from 129
patients were collected. Fourteen Curvelet transform textural features
were as parameters to establish support vector machine prediction
model. Results: Compared with other methods, using 252 texture
features as parameters to establish prediction model is more proper.
And the classification consistency, sensitivity and specificity for the
model are 81.5%, 93.8% and 38.0% respectively. Conclusion: Based
on texture features extracted from Curvelet transform, support vector
machine prediction model is sensitive to lung cancer, which can
promote the rate of diagnosis for early-stage lung cancer to some
extent.
Abstract: A novel method of learning complex fuzzy decision regions in the n-dimensional feature space is proposed. Through the fuzzy decision regions, a given pattern's class membership value of every class is determined instead of the conventional crisp class the pattern belongs to. The n-dimensional fuzzy decision region is approximated by union of hyperellipsoids. By explicitly parameterizing these hyperellipsoids, the decision regions are determined by estimating the parameters of each hyperellipsoid.Genetic Algorithm is applied to estimate the parameters of each region component. With the global optimization ability of GA, the learned decision region can be arbitrarily complex.
Abstract: An automated wood recognition system is designed to
classify tropical wood species.The wood features are extracted based
on two feature extractors: Basic Grey Level Aura Matrix (BGLAM)
technique and statistical properties of pores distribution (SPPD)
technique. Due to the nonlinearity of the tropical wood species
separation boundaries, a pre classification stage is proposed which
consists ofKmeans clusteringand kernel discriminant analysis (KDA).
Finally, Linear Discriminant Analysis (LDA) classifier and KNearest
Neighbour (KNN) are implemented for comparison purposes.
The study involves comparison of the system with and without pre
classification using KNN classifier and LDA classifier.The results
show that the inclusion of the pre classification stage has improved
the accuracy of both the LDA and KNN classifiers by more than
12%.
Abstract: Feature selection has recently been the subject of intensive research in data mining, specially for datasets with a large number of attributes. Recent work has shown that feature selection can have a positive effect on the performance of machine learning algorithms. The success of many learning algorithms in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. In this paper, a novel feature search procedure that utilizes the Ant Colony Optimization (ACO) is presented. The ACO is a metaheuristic inspired by the behavior of real ants in their search for the shortest paths to food sources. It looks for optimal solutions by considering both local heuristics and previous knowledge. When applied to two different classification problems, the proposed algorithm achieved very promising results.
Abstract: This paper presents a new strategy of identification
and classification of pathological voices using the hybrid method
based on wavelet transform and neural networks. After speech
acquisition from a patient, the speech signal is analysed in order to
extract the acoustic parameters such as the pitch, the formants, Jitter,
and shimmer. Obtained results will be compared to those normal and
standard values thanks to a programmable database. Sounds are
collected from normal people and patients, and then classified into
two different categories. Speech data base is consists of several
pathological and normal voices collected from the national hospital
“Rabta-Tunis". Speech processing algorithm is conducted in a
supervised mode for discrimination of normal and pathology voices
and then for classification between neural and vocal pathologies
(Parkinson, Alzheimer, laryngeal, dyslexia...). Several simulation
results will be presented in function of the disease and will be
compared with the clinical diagnosis in order to have an objective
evaluation of the developed tool.
Abstract: A new approach to promote the generalization ability
of neural networks is presented. It is based on the point of view of
fuzzy theory. This approach is implemented through shrinking or
magnifying the input vector, thereby reducing the difference between
training set and testing set. It is called “shrinking-magnifying
approach" (SMA). At the same time, a new algorithm; α-algorithm is
presented to find out the appropriate shrinking-magnifying-factor
(SMF) α and obtain better generalization ability of neural networks.
Quite a few simulation experiments serve to study the effect of SMA
and α-algorithm. The experiment results are discussed in detail, and
the function principle of SMA is analyzed in theory. The results of
experiments and analyses show that the new approach is not only
simpler and easier, but also is very effective to many neural networks
and many classification problems. In our experiments, the proportions
promoting the generalization ability of neural networks have even
reached 90%.
Abstract: Text Mining is around applying knowledge discovery
techniques to unstructured text is termed knowledge discovery in text
(KDT), or Text data mining or Text Mining. In decision tree
approach is most useful in classification problem. With this
technique, tree is constructed to model the classification process.
There are two basic steps in the technique: building the tree and
applying the tree to the database. This paper describes a proposed
C5.0 classifier that performs rulesets, cross validation and boosting
for original C5.0 in order to reduce the optimization of error ratio.
The feasibility and the benefits of the proposed approach are
demonstrated by means of medial data set like hypothyroid. It is
shown that, the performance of a classifier on the training cases from
which it was constructed gives a poor estimate by sampling or using a
separate test file, either way, the classifier is evaluated on cases that
were not used to build and evaluate the classifier are both are large. If
the cases in hypothyroid.data and hypothyroid.test were to be
shuffled and divided into a new 2772 case training set and a 1000
case test set, C5.0 might construct a different classifier with a lower
or higher error rate on the test cases. An important feature of see5 is
its ability to classifiers called rulesets. The ruleset has an error rate
0.5 % on the test cases. The standard errors of the means provide an
estimate of the variability of results. One way to get a more reliable
estimate of predictive is by f-fold –cross- validation. The error rate of
a classifier produced from all the cases is estimated as the ratio of the
total number of errors on the hold-out cases to the total number of
cases. The Boost option with x trials instructs See5 to construct up to
x classifiers in this manner. Trials over numerous datasets, large and
small, show that on average 10-classifier boosting reduces the error
rate for test cases by about 25%.
Abstract: In the recent past, there has been an increasing interest
in applying evolutionary methods to Knowledge Discovery in
Databases (KDD) and a number of successful applications of Genetic
Algorithms (GA) and Genetic Programming (GP) to KDD have been
demonstrated. The most predominant representation of the
discovered knowledge is the standard Production Rules (PRs) in the
form If P Then D. The PRs, however, are unable to handle
exceptions and do not exhibit variable precision. The Censored
Production Rules (CPRs), an extension of PRs, were proposed by
Michalski & Winston that exhibit variable precision and supports an
efficient mechanism for handling exceptions. A CPR is an
augmented production rule of the form:
If P Then D Unless C, where C (Censor) is an exception to the rule.
Such rules are employed in situations, in which the conditional
statement 'If P Then D' holds frequently and the assertion C holds
rarely. By using a rule of this type we are free to ignore the exception
conditions, when the resources needed to establish its presence are
tight or there is simply no information available as to whether it
holds or not. Thus, the 'If P Then D' part of the CPR expresses
important information, while the Unless C part acts only as a switch
and changes the polarity of D to ~D.
This paper presents a classification algorithm based on evolutionary
approach that discovers comprehensible rules with exceptions in the
form of CPRs.
The proposed approach has flexible chromosome encoding, where
each chromosome corresponds to a CPR. Appropriate genetic
operators are suggested and a fitness function is proposed that
incorporates the basic constraints on CPRs. Experimental results are
presented to demonstrate the performance of the proposed algorithm.
Abstract: Principle component analysis is often combined with
the state-of-art classification algorithms to recognize human faces.
However, principle component analysis can only capture these
features contributing to the global characteristics of data because it is a
global feature selection algorithm. It misses those features
contributing to the local characteristics of data because each principal
component only contains some levels of global characteristics of data.
In this study, we present a novel face recognition approach using
non-negative principal component analysis which is added with the
constraint of non-negative to improve data locality and contribute to
elucidating latent data structures. Experiments are performed on the
Cambridge ORL face database. We demonstrate the strong
performances of the algorithm in recognizing human faces in
comparison with PCA and NREMF approaches.
Abstract: Heart disease (HD) is a major cause of morbidity and mortality in the modern society. Medical diagnosis is an important but complicated task that should be performed accurately and efficiently and its automation would be very useful. All doctors are unfortunately not equally skilled in every sub specialty and they are in many places a scarce resource. A system for automated medical diagnosis would enhance medical care and reduce costs. In this paper, a new approach based on coactive neuro-fuzzy inference system (CANFIS) was presented for prediction of heart disease. The proposed CANFIS model combined the neural network adaptive capabilities and the fuzzy logic qualitative approach which is then integrated with genetic algorithm to diagnose the presence of the disease. The performances of the CANFIS model were evaluated in terms of training performances and classification accuracies and the results showed that the proposed CANFIS model has great potential in predicting the heart disease.
Abstract: Shot boundary detection is a fundamental step for the organization of large video data. In this paper, we propose a new method for video gradual shots detection and classification, using advantages of fractal analysis and AIS-based classifier. Proposed features are “vertical intercept" and “fractal dimension" of each frame of videos which are computed using Fourier transform coefficients. We also used a classifier based on Clonal Selection Algorithm. We have carried out our solution and assessed it according to the TRECVID2006 benchmark dataset.
Abstract: A retrospective study was undertaken to record the
occurrence and pattern of fractures in small animals (dogs and cats)
from year 2005 to 2010. A total of 650 cases were presented in small
animal surgery unit out of which of 116 (dogs and cats) were
presented with history of fractures of different bones. A total of
17.8% (116/650) cases were of fractures which constituted dogs 67%
while cats were 23%. The majority of animals were intact. Trauma in
the form of road side accident was the principal cause of fractures in
dogs whereas as in cats it was fall from height. The ages of the
fractured dog ranged from 4 months to 12 years whereas in cat it was
from 4 weeks to 10 years. The femoral fractures represented 37.5%
and 25% respectively in dogs and cats. Diaphysis, distal metaphyseal
and supracondylar fractures were the most affected sites in dog and
cats. Tibial fracture in dogs and cats represented 21.5% and 10%
while humoral fractures were 7.9% and 14% in dogs and cats
respectively. Humoral condyler fractures were most commonly seen
in puppies aged 4 to 6 months. Fractured radius-ulna incidence was
19% and 14% in dogs and cats respectively. Other fractures recorded
were of lumbar vertebrae, mandible and metacarpals etc. The
management comprised of external and internal fixation in both the
species. The most common internal fixation technique employed was
Intramedullary fixation in long followed by other methods like stack
or cross pinning, wiring etc as per findings in the cases. The cast
bandage was used majorly as mean for external coaptation. The
paper discusses the outcome of the case as per the technique
employed.
Abstract: A large number of semantic web service composition
approaches are developed by the research community and one is
more efficient than the other one depending on the particular
situation of use. So a close look at the requirements of ones particular
situation is necessary to find a suitable approach to use. In this paper,
we present a Technique Recommendation System (TRS) which using
a classification of state-of-art semantic web service composition
approaches, can provide the user of the system with the
recommendations regarding the use of service composition approach
based on some parameters regarding situation of use. TRS has
modular architecture and uses the production-rules for knowledge
representation.
Abstract: Granular computing deals with representation of information in the form of some aggregates and related methods for transformation and analysis for problem solving. A granulation scheme based on clustering and Rough Set Theory is presented with focus on structured conceptualization of information has been presented in this paper. Experiments for the proposed method on four labeled data exhibit good result with reference to classification problem. The proposed granulation technique is semi-supervised imbibing global as well as local information granulation. To represent the results of the attribute oriented granulation a tree structure is proposed in this paper.
Abstract: Brain Computer Interface (BCI) has been recently
increased in research. Functional Near Infrared Spectroscope (fNIRs)
is one the latest technologies which utilize light in the near-infrared
range to determine brain activities. Because near infrared technology
allows design of safe, portable, wearable, non-invasive and wireless
qualities monitoring systems, fNIRs monitoring of brain
hemodynamics can be value in helping to understand brain tasks. In
this paper, we present results of fNIRs signal analysis indicating that
there exist distinct patterns of hemodynamic responses which
recognize brain tasks toward developing a BCI. We applied two
different mathematics tools separately, Wavelets analysis for
preprocessing as signal filters and feature extractions and Neural
networks for cognition brain tasks as a classification module. We
also discuss and compare with other methods while our proposals
perform better with an average accuracy of 99.9% for classification.
Abstract: Speckled images arise when coherent microwave,
optical, and acoustic imaging techniques are used to image an object, surface or scene. Examples of coherent imaging systems include synthetic aperture radar, laser imaging systems, imaging sonar
systems, and medical ultrasound systems. Speckle noise is a form of object or target induced noise that results when the surface of the object is Rayleigh rough compared to the wavelength of the illuminating radiation. Detection and estimation in images corrupted
by speckle noise is complicated by the nature of the noise and is not
as straightforward as detection and estimation in additive noise. In
this work, we derive stochastic models for speckle noise, with an emphasis on speckle as it arises in medical ultrasound images. The
motivation for this work is the problem of segmentation and tissue classification using ultrasound imaging. Modeling of speckle in this
context involves partially developed speckle model where an underlying Poisson point process modulates a Gram-Charlier series
of Laguerre weighted exponential functions, resulting in a doubly
stochastic filtered Poisson point process. The statistical distribution of partially developed speckle is derived in a closed canonical form.
It is observed that as the mean number of scatterers in a resolution cell is increased, the probability density function approaches an
exponential distribution. This is consistent with fully developed speckle noise as demonstrated by the Central Limit theorem.
Abstract: The automatic classification of non stationary signals is an important practical goal in several domains. An essential classification task is to allocate the incoming signal to a group associated with the kind of physical phenomena producing it. In this paper, we present a modular system composed by three blocs: 1) Representation, 2) Dimensionality reduction and 3) Classification. The originality of our work consists in the use of a new wavelet called "Ben wavelet" in the representation stage. For the dimensionality reduction, we propose a new algorithm based on the random projection and the principal component analysis.