Abstract: Text Mining is around applying knowledge discovery
techniques to unstructured text is termed knowledge discovery in text
(KDT), or Text data mining or Text Mining. In decision tree
approach is most useful in classification problem. With this
technique, tree is constructed to model the classification process.
There are two basic steps in the technique: building the tree and
applying the tree to the database. This paper describes a proposed
C5.0 classifier that performs rulesets, cross validation and boosting
for original C5.0 in order to reduce the optimization of error ratio.
The feasibility and the benefits of the proposed approach are
demonstrated by means of medial data set like hypothyroid. It is
shown that, the performance of a classifier on the training cases from
which it was constructed gives a poor estimate by sampling or using a
separate test file, either way, the classifier is evaluated on cases that
were not used to build and evaluate the classifier are both are large. If
the cases in hypothyroid.data and hypothyroid.test were to be
shuffled and divided into a new 2772 case training set and a 1000
case test set, C5.0 might construct a different classifier with a lower
or higher error rate on the test cases. An important feature of see5 is
its ability to classifiers called rulesets. The ruleset has an error rate
0.5 % on the test cases. The standard errors of the means provide an
estimate of the variability of results. One way to get a more reliable
estimate of predictive is by f-fold –cross- validation. The error rate of
a classifier produced from all the cases is estimated as the ratio of the
total number of errors on the hold-out cases to the total number of
cases. The Boost option with x trials instructs See5 to construct up to
x classifiers in this manner. Trials over numerous datasets, large and
small, show that on average 10-classifier boosting reduces the error
rate for test cases by about 25%.
Abstract: Shot boundary detection is a fundamental step for the organization of large video data. In this paper, we propose a new method for video gradual shots detection and classification, using advantages of fractal analysis and AIS-based classifier. Proposed features are “vertical intercept" and “fractal dimension" of each frame of videos which are computed using Fourier transform coefficients. We also used a classifier based on Clonal Selection Algorithm. We have carried out our solution and assessed it according to the TRECVID2006 benchmark dataset.
Abstract: This paper focuses on the data-driven generation
of fuzzy IF...THEN rules. The resulted fuzzy rule base can be
applied to build a classifier, a model used for prediction, or
it can be applied to form a decision support system. Among
the wide range of possible approaches, the decision tree and
the association rule based algorithms are overviewed, and two
new approaches are presented based on the a priori fuzzy
clustering based partitioning of the continuous input variables.
An application study is also presented, where the developed
methods are tested on the well known Wisconsin Breast Cancer
classification problem.
Abstract: Intrusion detection is a mechanism used to protect a
system and analyse and predict the behaviours of system users. An
ideal intrusion detection system is hard to achieve due to
nonlinearity, and irrelevant or redundant features. This study
introduces a new anomaly-based intrusion detection model. The
suggested model is based on particle swarm optimisation and
nonlinear, multi-class and multi-kernel support vector machines.
Particle swarm optimisation is used for feature selection by applying
a new formula to update the position and the velocity of a particle;
the support vector machine is used as a classifier. The proposed
model is tested and compared with the other methods using the KDD
CUP 1999 dataset. The results indicate that this new method achieves
better accuracy rates than previous methods.
Abstract: The electroencephalograph (EEG) signal is one of the most widely signal used in the bioinformatics field due to its rich information about human tasks. In this work EEG waves classification is achieved using the Discrete Wavelet Transform DWT with Fast Fourier Transform (FFT) by adopting the normalized EEG data. The DWT is used as a classifier of the EEG wave's frequencies, while FFT is implemented to visualize the EEG waves in multi-resolution of DWT. Several real EEG data sets (real EEG data for both normal and abnormal persons) have been tested and the results improve the validity of the proposed technique.
Abstract: In this paper, a second order autoregressive (AR)
model is proposed to discriminate alcoholics using single trial
gamma band Visual Evoked Potential (VEP) signals using 3 different
classifiers: Simplified Fuzzy ARTMAP (SFA) neural network (NN),
Multilayer-perceptron-backpropagation (MLP-BP) NN and Linear
Discriminant (LD). Electroencephalogram (EEG) signals were
recorded from alcoholic and control subjects during the presentation
of visuals from Snodgrass and Vanderwart picture set. Single trial
VEP signals were extracted from EEG signals using Elliptic filtering
in the gamma band spectral range. A second order AR model was
used as gamma band VEP exhibits pseudo-periodic behaviour and
second order AR is optimal to represent this behaviour. This
circumvents the requirement of having to use some criteria to choose
the correct order. The averaged discrimination errors of 2.6%, 2.8%
and 11.9% were given by LD, MLP-BP and SFA classifiers. The
high LD discrimination results show the validity of the proposed
method to discriminate between alcoholic subjects.
Abstract: This paper presents the prediction of kidney
dysfunction using different neural network (NN) approaches. Self
organization Maps (SOM), Probabilistic Neural Network (PNN) and
Multi Layer Perceptron Neural Network (MLPNN) trained with Back
Propagation Algorithm (BPA) are used in this study. Six hundred and
sixty three sets of analytical laboratory tests have been collected from
one of the private clinical laboratories in Baghdad. For each subject,
Serum urea and Serum creatinin levels have been analyzed and tested
by using clinical laboratory measurements. The collected urea and
cretinine levels are then used as inputs to the three NN models in
which the training process is done by different neural approaches.
SOM which is a class of unsupervised network whereas PNN and
BPNN are considered as class of supervised networks. These
networks are used as a classifier to predict whether kidney is normal
or it will have a dysfunction. The accuracy of prediction, sensitivity
and specificity were found for each type of the proposed networks
.We conclude that PNN gives faster and more accurate prediction of
kidney dysfunction and it works as promising tool for predicting of
routine kidney dysfunction from the clinical laboratory data.
Abstract: This paper presents a new method to detect high impedance faults in radial distribution systems. Magnitudes of third and fifth harmonic components of voltages and currents are used as a feature vector for fault discrimination. The proposed methodology uses a learning vector quantization (LVQ) neural network as a classifier for identifying high impedance arc-type faults. The network learns from the data obtained from simulation of a simple radial system under different fault and system conditions. Compared to a feed-forward neural network, a properly tuned LVQ network gives quicker response.
Abstract: In this study, a classification-based video
super-resolution method using artificial neural network (ANN) is
proposed to enhance low-resolution (LR) to high-resolution (HR)
frames. The proposed method consists of four main steps:
classification, motion-trace volume collection, temporal adjustment,
and ANN prediction. A classifier is designed based on the edge
properties of a pixel in the LR frame to identify the spatial information.
To exploit the spatio-temporal information, a motion-trace volume is
collected using motion estimation, which can eliminate unfathomable
object motion in the LR frames. In addition, temporal lateral process is
employed for volume adjustment to reduce unnecessary temporal
features. Finally, ANN is applied to each class to learn the complicated
spatio-temporal relationship between LR and HR frames. Simulation
results show that the proposed method successfully improves both
peak signal-to-noise ratio and perceptual quality.
Abstract: Sleep stage scoring is the process of classifying the
stage of the sleep in which the subject is in. Sleep is classified into
two states based on the constellation of physiological parameters.
The two states are the non-rapid eye movement (NREM) and the
rapid eye movement (REM). The NREM sleep is also classified into
four stages (1-4). These states and the state wakefulness are
distinguished from each other based on the brain activity. In this
work, a classification method for automated sleep stage scoring
based on a single EEG recording using wavelet packet decomposition
was implemented. Thirty two ploysomnographic recording from the
MIT-BIH database were used for training and validation of the
proposed method. A single EEG recording was extracted and
smoothed using Savitzky-Golay filter. Wavelet packets
decomposition up to the fourth level based on 20th order Daubechies
filter was used to extract features from the EEG signal. A features
vector of 54 features was formed. It was reduced to a size of 25 using
the gain ratio method and fed into a classifier of regression trees. The
regression trees were trained using 67% of the records available. The
records for training were selected based on cross validation of the
records. The remaining of the records was used for testing the
classifier. The overall correct rate of the proposed method was found
to be around 75%, which is acceptable compared to the techniques in
the literature.
Abstract: As in today's semiconductor industries test costs can make up to 50 percent of the total production costs, an efficient test error detection becomes more and more important. In this paper, we present a new machine learning approach to test error detection that should provide a faster recognition of test system faults as well as an improved test error recall. The key idea is to learn a classifier ensemble, detecting typical test error patterns in wafer test results immediately after finishing these tests. Since test error detection has not yet been discussed in the machine learning community, we define central problem-relevant terms and provide an analysis of important domain properties. Finally, we present comparative studies reflecting the failure detection performance of three individual classifiers and three ensemble methods based upon them. As base classifiers we chose a decision tree learner, a support vector machine and a Bayesian network, while the compared ensemble methods were simple and weighted majority vote as well as stacking. For the evaluation, we used cross validation and a specially designed practical simulation. By implementing our approach in a semiconductor test department for the observation of two products, we proofed its practical applicability.
Abstract: This work presents an approach for the construction of a hybrid color-texture space by using mutual information. Feature extraction is done by the Laws filter with SVM (Support Vectors Machine) as a classifier. The classification is applied on the VisTex database and a SPOT HRV (XS) image representing two forest areas in the region of Rabat in Morocco. The result of classification obtained in the hybrid space is compared with the one obtained in the RGB color space.
Abstract: In this paper, a new learning approach for network
intrusion detection using naïve Bayesian classifier and ID3 algorithm
is presented, which identifies effective attributes from the training
dataset, calculates the conditional probabilities for the best attribute
values, and then correctly classifies all the examples of training and
testing dataset. Most of the current intrusion detection datasets are
dynamic, complex and contain large number of attributes. Some of
the attributes may be redundant or contribute little for detection
making. It has been successfully tested that significant attribute
selection is important to design a real world intrusion detection
systems (IDS). The purpose of this study is to identify effective
attributes from the training dataset to build a classifier for network
intrusion detection using data mining algorithms. The experimental
results on KDD99 benchmark intrusion detection dataset demonstrate
that this new approach achieves high classification rates and reduce
false positives using limited computational resources.
Abstract: The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag Of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained.
Abstract: Face detection and recognition has many applications
in a variety of fields such as security system, videoconferencing and
identification. Face classification is currently implemented in
software. A hardware implementation allows real-time processing,
but has higher cost and time to-market.
The objective of this work is to implement a classifier based on
neural networks MLP (Multi-layer Perceptron) for face detection.
The MLP is used to classify face and non-face patterns. The systm is
described using C language on a P4 (2.4 Ghz) to extract weight
values. Then a Hardware implementation is achieved using VHDL
based Methodology. We target Xilinx FPGA as the implementation
support.
Abstract: The Linear discriminant analysis (LDA) can be
generalized into a nonlinear form - kernel LDA (KLDA) expediently
by using the kernel functions. But KLDA is often referred to a general
eigenvalue problem in singular case. To avoid this complication, this
paper proposes an iterative algorithm for the two-class KLDA. The
proposed KLDA is used as a nonlinear discriminant classifier, and the
experiments show that it has a comparable performance with SVM.
Abstract: Fake finger submission attack is a major problem in fingerprint recognition systems. In this paper, we introduce an aliveness detection method based on multiple static features, which derived from a single fingerprint image. The static features are comprised of individual pore spacing, residual noise and several first order statistics. Specifically, correlation filter is adopted to address individual pore spacing. The multiple static features are useful to reflect the physiological and statistical characteristics of live and fake fingerprint. The classification can be made by calculating the liveness scores from each feature and fusing the scores through a classifier. In our dataset, we compare nine classifiers and the best classification rate at 85% is attained by using a Reduced Multivariate Polynomial classifier. Our approach is faster and more convenient for aliveness check for field applications.
Abstract: This paper gives a novel method for improving
classification performance for cancer classification with very few
microarray Gene expression data. The method employs classification
with individual gene ranking and gene subset ranking. For selection
and classification, the proposed method uses the same classifier. The
method is applied to three publicly available cancer gene expression
datasets from Lymphoma, Liver and Leukaemia datasets. Three
different classifiers namely Support vector machines-one against all
(SVM-OAA), K nearest neighbour (KNN) and Linear Discriminant
analysis (LDA) were tested and the results indicate the improvement
in performance of SVM-OAA classifier with satisfactory results on
all the three datasets when compared with the other two classifiers.
Abstract: Bidding is a very important business function to find
latent contractors of construction projects. Moreover, bid markup is
one of the most important decisions for a bidder to gain a reasonable
profit. Since the bidding system is a complex adaptive system, bidding
agent need a learning process to get more valuable knowledge for a bid,
especially from past public bidding information. In this paper, we
proposed an iterative agent leaning model for bidders to make markup
decisions. A classifier for public bidding information named PIBS is
developed to make full use of history data for classifying new bidding
information. The simulation and experimental study is performed to
show the validity of the proposed classifier. Some factors that affect
the validity of PIBS are also analyzed at the end of this work.
Abstract: A state of the art Speaker Identification (SI) system requires a robust feature extraction unit followed by a speaker modeling scheme for generalized representation of these features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for SI applications. However, due to the structure of its filter bank, it captures vocal tract characteristics more effectively in the lower frequency regions. This paper proposes a new set of features using a complementary filter bank structure which improves distinguishability of speaker specific cues present in the higher frequency zone. Unlike high level features that are difficult to extract, the proposed feature set involves little computational burden during the extraction process. When combined with MFCC via a parallel implementation of speaker models, the proposed feature set outperforms baseline MFCC significantly. This proposition is validated by experiments conducted on two different kinds of public databases namely YOHO (microphone speech) and POLYCOST (telephone speech) with Gaussian Mixture Models (GMM) as a Classifier for various model orders.