Abstract: Diabetic retinopathy is characterized by the development of retinal microaneurysms. The damage can be prevented if disease is treated in its early stages. In this paper, we are comparing Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers for automatic microaneurysm detection in images acquired through non-dilated pupils. The Nearest Neighbor classifier is used as a baseline for comparison. Detected microaneurysms are validated with expert ophthalmologists’ hand-drawn ground-truths. The sensitivity, specificity, precision and accuracy of each method are also compared.
Abstract: In this study, it was tried to identify some heart rhythm disorders by electrocardiography (ECG) data that is taken from MIT-BIH arrhythmia database by subtracting the required features, presenting to artificial neural networks (ANN), artificial immune systems (AIS), artificial neural network based on artificial immune system (AIS-ANN) and particle swarm optimization based artificial neural network (PSO-NN) classifier systems. The main purpose of this study is to evaluate the performance of hybrid AIS-ANN and PSO-ANN classifiers with regard to the ANN and AIS. For this purpose, the normal sinus rhythm (NSR), atrial premature contraction (APC), sinus arrhythmia (SA), ventricular trigeminy (VTI), ventricular tachycardia (VTK) and atrial fibrillation (AF) data for each of the RR intervals were found. Then these data in the form of pairs (NSR-APC, NSR-SA, NSR-VTI, NSR-VTK and NSR-AF) is created by combining discrete wavelet transform which is applied to each of these two groups of data and two different data sets with 9 and 27 features were obtained from each of them after data reduction. Afterwards, the data randomly was firstly mixed within themselves, and then 4-fold cross validation method was applied to create the training and testing data. The training and testing accuracy rates and training time are compared with each other.
As a result, performances of the hybrid classification systems, AIS-ANN and PSO-ANN were seen to be close to the performance of the ANN system. Also, the results of the hybrid systems were much better than AIS, too. However, ANN had much shorter period of training time than other systems. In terms of training times, ANN was followed by PSO-ANN, AIS-ANN and AIS systems respectively. Also, the features that extracted from the data affected the classification results significantly.
Abstract: This work presents a proposal to perform contextual sentiment analysis using a supervised learning algorithm and disregarding the extensive training of annotators. To achieve this goal, a web platform was developed to perform the entire procedure outlined in this paper. The main contribution of the pipeline described in this article is to simplify and automate the annotation process through a system of analysis of congruence between the notes. This ensured satisfactory results even without using specialized annotators in the context of the research, avoiding the generation of biased training data for the classifiers. For this, a case
study was conducted in a blog of entrepreneurship. The experimental results were consistent with the literature related annotation using formalized process with experts.
Abstract: Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modeling and testing work was done in R and Greenplum in-database analytic tools.
Abstract: Associative classification (AC) is a data mining approach that combines association rule and classification to build classification models (classifiers). AC has attracted a significant attention from several researchers mainly because it derives accurate classifiers that contain simple yet effective rules. In the last decade, a number of associative classification algorithms have been proposed such as Classification based Association (CBA), Classification based on Multiple Association Rules (CMAR), Class based Associative Classification (CACA), and Classification based on Predicted Association Rule (CPAR). This paper surveys major AC algorithms and compares the steps and methods performed in each algorithm including: rule learning, rule sorting, rule pruning, classifier building, and class prediction.
Abstract: Feature selection is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that all genes are not important in gene expression data. Some of the genes may be redundant, and others may be irrelevant and noisy. Here a novel approach is proposed Hybrid K-Mean-Quick Reduct (KMQR) algorithm for gene selection from gene expression data. In this study, the entire dataset is divided into clusters by applying K-Means algorithm. Each cluster contains similar genes. The high class discriminated genes has been selected based on their degree of dependence by applying Quick Reduct algorithm to all the clusters. Average Correlation Value (ACV) is calculated for the high class discriminated genes. The clusters which have the ACV value as 1 is determined as significant clusters, whose classification accuracy will be equal or high when comparing to the accuracy of the entire dataset. The proposed algorithm is evaluated using WEKA classifiers and compared. The proposed work shows that the high classification accuracy.
Abstract: In this paper we propose a new classification method for automatic sleep scoring using an artificial neural network based decision tree. It attempts to treat sleep scoring progress as a series of two-class problems and solves them with a decision tree made up of a group of neural network classifiers, each of which uses a special feature set and is aimed at only one specific sleep stage in order to maximize the classification effect. A single electroencephalogram (EEG) signal is used for our analysis rather than depending on multiple biological signals, which makes greatly simplifies the data acquisition process. Experimental results demonstrate that the average epoch by epoch agreement between the visual and the proposed method in separating 30s wakefulness+S1, REM, S2 and SWS epochs was 88.83%. This study shows that the proposed method performed well in all the four stages, and can effectively limit error propagation at the same time. It could, therefore, be an efficient method for automatic sleep scoring. Additionally, since it requires only a small volume of data it could be suited to pervasive applications.
Abstract: Text categorization is the problem of classifying text
documents into a set of predefined classes. After a preprocessing
step, the documents are typically represented as large sparse vectors.
When training classifiers on large collections of documents, both the
time and memory restrictions can be quite prohibitive. This justifies
the application of feature selection methods to reduce the
dimensionality of the document-representation vector. In this paper,
we present three feature selection methods: Information Gain,
Support Vector Machine feature selection called (SVM_FS) and
Genetic Algorithm with SVM (called GA_SVM). We show that the
best results were obtained with GA_SVM method for a relatively
small dimension of the feature vector.
Abstract: This paper regards the phenomena of intensive suburbanization and urbanization in Olomouc city and in Olomouc region in general for the period of 1986–2009. A Remote Sensing approach that involves tracking of changes in Land Cover units is proposed to quantify the urbanization state and trends in temporal and spatial aspects. It actually consisted of two approaches, Experiment 1 and Experiment 2 which implied two different image classification solutions in order to provide Land Cover maps for each 1986–2009 time split available in the Landsat image set. Experiment 1 dealt with the unsupervised classification, while Experiment 2 involved semi- supervised classification, using a combination of object-based and pixel-based classifiers. The resulting Land Cover maps were subsequently quantified for the proportion of urban area unit and its trend through time, and also for the urban area unit stability, yielding the relation of spatial and temporal development of the urban area unit. Some outcomes seem promising but there is indisputably room for improvements of source data and also processing and filtering.
Abstract: Learning using labeled and unlabelled data has
received considerable amount of attention in the machine learning
community due its potential in reducing the need for expensive
labeled data. In this work we present a new method for combining
labeled and unlabeled data based on classifier ensembles. The model
we propose assumes each classifier in the ensemble observes the
input using different set of features. Classifiers are initially trained
using some labeled samples. The trained classifiers learn further
through labeling the unknown patterns using a teaching signals that is
generated using the decision of the classifier ensemble, i.e. the
classifiers self-supervise each other. Experiments on a set of object
images are presented. Our experiments investigate different classifier
models, different fusing techniques, different training sizes and
different input features. Experimental results reveal that the proposed
self-supervised ensemble learning approach reduces classification
error over the single classifier and the traditional ensemble classifier
approachs.
Abstract: An early and accurate detection of Alzheimer's disease (AD) is an important stage in the treatment of individuals suffering from AD. We present an approach based on the use of structural magnetic resonance imaging (sMRI) phase images to distinguish between normal controls (NC), mild cognitive impairment (MCI) and AD patients with clinical dementia rating (CDR) of 1. Independent component analysis (ICA) technique is used for extracting useful features which form the inputs to the support vector machines (SVM), K nearest neighbour (kNN) and multilayer artificial neural network (ANN) classifiers to discriminate between the three classes. The obtained results are encouraging in terms of classification accuracy and effectively ascertain the usefulness of phase images for the classification of different stages of Alzheimer-s disease.
Abstract: Naïve Bayes classifiers are simple probabilistic
classifiers. Classification extracts patterns by using data file with a set
of labeled training examples and is currently one of the most
significant areas in data mining. However, Naïve Bayes assumes the
independence among the features. Structural learning among the
features thus helps in the classification problem. In this study, the use
of structural learning in Bayesian Network is proposed to be applied
where there are relationships between the features when using the
Naïve Bayes. The improvement in the classification using structural
learning is shown if there exist relationship between the features or
when they are not independent.
Abstract: In this paper, a new learning algorithm based on a
hybrid metaheuristic integrating Differential Evolution (DE) and
Reduced Variable Neighborhood Search (RVNS) is introduced to train
the classification method PROAFTN. To apply PROAFTN, values of
several parameters need to be determined prior to classification. These
parameters include boundaries of intervals and relative weights for
each attribute. Based on these requirements, the hybrid approach,
named DEPRO-RVNS, is presented in this study. In some cases, the
major problem when applying DE to some classification problems
was the premature convergence of some individuals to local optima.
To eliminate this shortcoming and to improve the exploration and
exploitation capabilities of DE, such individuals were set to iteratively
re-explored using RVNS. Based on the generated results on
both training and testing data, it is shown that the performance of
PROAFTN is significantly improved. Furthermore, the experimental
study shows that DEPRO-RVNS outperforms well-known machine
learning classifiers in a variety of problems.
Abstract: Purpose of this work is the development of an
automatic classification system which could be useful for radiologists
in the investigation of breast cancer. The software has been designed
in the framework of the MAGIC-5 collaboration.
In the automatic classification system the suspicious regions with
high probability to include a lesion are extracted from the image as
regions of interest (ROIs). Each ROI is characterized by some
features based on morphological lesion differences.
Some classifiers as a Feed Forward Neural Network, a K-Nearest
Neighbours and a Support Vector Machine are used to distinguish the
pathological records from the healthy ones.
The results obtained in terms of sensitivity (percentage of
pathological ROIs correctly classified) and specificity (percentage of
non-pathological ROIs correctly classified) will be presented through
the Receive Operating Characteristic curve (ROC). In particular the
best performances are 88% ± 1 of area under ROC curve obtained
with the Feed Forward Neural Network.
Abstract: Combined therapy using Interferon and Ribavirin is the standard treatment in patients with chronic hepatitis C. However, the number of responders to this treatment is low, whereas its cost and side effects are high. Therefore, there is a clear need to predict patient’s response to the treatment based on clinical information to protect the patients from the bad drawbacks, Intolerable side effects and waste of money. Different machine learning techniques have been developed to fulfill this purpose. From these techniques are Associative Classification (AC) and Decision Tree (DT). The aim of this research is to compare the performance of these two techniques in the prediction of virological response to the standard treatment of HCV from clinical information. 200 patients treated with Interferon and Ribavirin; were analyzed using AC and DT. 150 cases had been used to train the classifiers and 50 cases had been used to test the classifiers. The experiment results showed that the two techniques had given acceptable results however the best accuracy for the AC reached 92% whereas for DT reached 80%.
Abstract: In this paper we are interested in classification problems
with a performance constraint on error probability. In such
problems if the constraint cannot be satisfied, then a rejection option
is introduced. For binary labelled classification, a number of SVM
based methods with rejection option have been proposed over the
past few years. All of these methods use two thresholds on the SVM
output. However, in previous works, we have shown on synthetic data
that using thresholds on the output of the optimal SVM may lead to
poor results for classification tasks with performance constraint. In
this paper a new method for supervised classification with rejection
option is proposed. It consists in two different classifiers jointly
optimized to minimize the rejection probability subject to a given
constraint on error rate. This method uses a new kernel based linear
learning machine that we have recently presented. This learning
machine is characterized by its simplicity and high training speed
which makes the simultaneous optimization of the two classifiers
computationally reasonable. The proposed classification method with
rejection option is compared to a SVM based rejection method
proposed in recent literature. Experiments show the superiority of
the proposed method.
Abstract: Until recently, researchers have developed various
tools and methodologies for effective clinical decision-making.
Among those decisions, chest pain diseases have been one of
important diagnostic issues especially in an emergency department. To
improve the ability of physicians in diagnosis, many researchers have
developed diagnosis intelligence by using machine learning and data
mining. However, most of the conventional methodologies have been
generally based on a single classifier for disease classification and
prediction, which shows moderate performance. This study utilizes an
ensemble strategy to combine multiple different classifiers to help
physicians diagnose chest pain diseases more accurately than ever.
Specifically the ensemble strategy is applied by using the integration
of decision trees, neural networks, and support vector machines. The
ensemble models are applied to real-world emergency data. This study
shows that the performance of the ensemble models is superior to each
of single classifiers.
Abstract: An automated wood recognition system is designed to
classify tropical wood species.The wood features are extracted based
on two feature extractors: Basic Grey Level Aura Matrix (BGLAM)
technique and statistical properties of pores distribution (SPPD)
technique. Due to the nonlinearity of the tropical wood species
separation boundaries, a pre classification stage is proposed which
consists ofKmeans clusteringand kernel discriminant analysis (KDA).
Finally, Linear Discriminant Analysis (LDA) classifier and KNearest
Neighbour (KNN) are implemented for comparison purposes.
The study involves comparison of the system with and without pre
classification using KNN classifier and LDA classifier.The results
show that the inclusion of the pre classification stage has improved
the accuracy of both the LDA and KNN classifiers by more than
12%.
Abstract: Text Mining is around applying knowledge discovery
techniques to unstructured text is termed knowledge discovery in text
(KDT), or Text data mining or Text Mining. In decision tree
approach is most useful in classification problem. With this
technique, tree is constructed to model the classification process.
There are two basic steps in the technique: building the tree and
applying the tree to the database. This paper describes a proposed
C5.0 classifier that performs rulesets, cross validation and boosting
for original C5.0 in order to reduce the optimization of error ratio.
The feasibility and the benefits of the proposed approach are
demonstrated by means of medial data set like hypothyroid. It is
shown that, the performance of a classifier on the training cases from
which it was constructed gives a poor estimate by sampling or using a
separate test file, either way, the classifier is evaluated on cases that
were not used to build and evaluate the classifier are both are large. If
the cases in hypothyroid.data and hypothyroid.test were to be
shuffled and divided into a new 2772 case training set and a 1000
case test set, C5.0 might construct a different classifier with a lower
or higher error rate on the test cases. An important feature of see5 is
its ability to classifiers called rulesets. The ruleset has an error rate
0.5 % on the test cases. The standard errors of the means provide an
estimate of the variability of results. One way to get a more reliable
estimate of predictive is by f-fold –cross- validation. The error rate of
a classifier produced from all the cases is estimated as the ratio of the
total number of errors on the hold-out cases to the total number of
cases. The Boost option with x trials instructs See5 to construct up to
x classifiers in this manner. Trials over numerous datasets, large and
small, show that on average 10-classifier boosting reduces the error
rate for test cases by about 25%.
Abstract: Classification of Persian printed numeral characters
has been considered and a proposed system has been introduced. In
representation stage, for the first time in Persian optical character
recognition, extended moment invariants has been utilized as
characters image descriptor. In classification stage, four different
classifiers namely minimum mean distance, nearest neighbor rule,
multi layer perceptron, and fuzzy min-max neural network has been
used, which first and second are traditional nonparametric statistical
classifier. Third is a well-known neural network and forth is a kind of
fuzzy neural network that is based on utilizing hyperbox fuzzy sets.
Set of different experiments has been done and variety of results has
been presented. The results showed that extended moment invariants
are qualified as features to classify Persian printed numeral
characters.