Abstract: In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.
Abstract: In order to help the expert to validate association rules
extracted from data, some quality measures are proposed in the
literature. We distinguish two categories: objective and subjective
measures. The first one depends on a fixed threshold and on data
quality from which the rules are extracted. The second one consists
on providing to the expert some tools in the objective to explore and
visualize rules during the evaluation step. However, the number of
extracted rules to validate remains high. Thus, the manually mining
rules task is very hard. To solve this problem, we propose, in this
paper, a semi-automatic method to assist the expert during the
association rule's validation. Our method uses rule-based
classification as follow: (i) We transform association rules into
classification rules (classifiers), (ii) We use the generated classifiers
for data classification. (iii) We visualize association rules with their
quality classification to give an idea to the expert and to assist him
during validation process.
Abstract: The problems arising from unbalanced data sets
generally appear in real world applications. Due to unequal class
distribution, many researchers have found that the performance of
existing classifiers tends to be biased towards the majority class. The
k-nearest neighbors’ nonparametric discriminant analysis is a method
that was proposed for classifying unbalanced classes with good
performance. In this study, the methods of discriminant analysis are
of interest in investigating misclassification error rates for classimbalanced
data of three diabetes risk groups. The purpose of this
study was to compare the classification performance between
parametric discriminant analysis and nonparametric discriminant
analysis in a three-class classification of class-imbalanced data of
diabetes risk groups. Data from a project maintaining healthy
conditions for 599 employees of a government hospital in Bangkok
were obtained for the classification problem. The employees were
divided into three diabetes risk groups: non-risk (90%), risk (5%),
and diabetic (5%). The original data including the variables of
diabetes risk group, age, gender, blood glucose, and BMI were
analyzed and bootstrapped for 50 and 100 samples, 599 observations
per sample, for additional estimation of the misclassification error
rate. Each data set was explored for the departure of multivariate
normality and the equality of covariance matrices of the three risk
groups. Both the original data and the bootstrap samples showed nonnormality
and unequal covariance matrices. The parametric linear
discriminant function, quadratic discriminant function, and the
nonparametric k-nearest neighbors’ discriminant function were
performed over 50 and 100 bootstrap samples and applied to the
original data. Searching the optimal classification rule, the choices of
prior probabilities were set up for both equal proportions (0.33: 0.33:
0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10)
and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples
indicated that the k-nearest neighbors approach when k=3 or k=4 and
the defined prior probabilities of non-risk: risk: diabetic as 0.90:
0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of
misclassification. The k-nearest neighbors approach would be
suggested for classifying a three-class-imbalanced data of diabetes
risk groups.
Abstract: This paper focuses on analyzing medical diagnostic data using classification rules in data mining and context reduction in formal concept analysis. It helps in finding redundancies among the various medical examination tests used in diagnosis of a disease. Classification rules have been derived from positive and negative association rules using the Concept lattice structure of the Formal Concept Analysis. Context reduction technique given in Formal Concept Analysis along with classification rules has been used to find redundancies among the various medical examination tests. Also it finds out whether expensive medical tests can be replaced by some cheaper tests.
Abstract: This study proposes a novel recommender system to
provide the advertisements of context-aware services. Our proposed
model is designed to apply a modified collaborative filtering (CF)
algorithm with regard to the several dimensions for the personalization
of mobile devices – location, time and the user-s needs type. In
particular, we employ a classification rule to understand user-s needs
type using a decision tree algorithm. In addition, we collect primary
data from the mobile phone users and apply them to the proposed
model to validate its effectiveness. Experimental results show that the
proposed system makes more accurate and satisfactory advertisements
than comparative systems.
Abstract: The main goal of data mining is to extract accurate, comprehensible and interesting knowledge from databases that may be considered as large search spaces. In this paper, a new, efficient type of Genetic Algorithm (GA) called uniform two-level GA is proposed as a search strategy to discover truly interesting, high-level prediction rules, a difficult problem and relatively little researched, rather than discovering classification knowledge as usual in the literatures. The proposed method uses the advantage of uniform population method and addresses the task of generalized rule induction that can be regarded as a generalization of the task of classification. Although the task of generalized rule induction requires a lot of computations, which is usually not satisfied with the normal algorithms, it was demonstrated that this method increased the performance of GAs and rapidly found interesting rules.
Abstract: The ElectroEncephaloGram (EEG) is useful for
clinical diagnosis and biomedical research. EEG signals often
contain strong ElectroOculoGram (EOG) artifacts produced
by eye movements and eye blinks especially in EEG recorded
from frontal channels. These artifacts obscure the underlying
brain activity, making its visual or automated inspection
difficult. The goal of ocular artifact removal is to remove
ocular artifacts from the recorded EEG, leaving the underlying
background signals due to brain activity. In recent times,
Independent Component Analysis (ICA) algorithms have
demonstrated superior potential in obtaining the least
dependent source components. In this paper, the independent
components are obtained by using the JADE algorithm (best
separating algorithm) and are classified into either artifact
component or neural component. Neural Network is used for
the classification of the obtained independent components.
Neural Network requires input features that exactly represent
the true character of the input signals so that the neural
network could classify the signals based on those key
characters that differentiate between various signals. In this
work, Auto Regressive (AR) coefficients are used as the input
features for classification. Two neural network approaches
are used to learn classification rules from EEG data. First, a
Polynomial Neural Network (PNN) trained by GMDH (Group
Method of Data Handling) algorithm is used and secondly,
feed-forward neural network classifier trained by a standard
back-propagation algorithm is used for classification and the
results show that JADE-FNN performs better than JADEPNN.
Abstract: Much research into handwritten Thai character
recognition have been proposed, such as comparing heads of
characters, Fuzzy logic and structure trees, etc. This paper presents a
system of handwritten Thai character recognition, which is based on
the Ant-minor algorithm (data mining based on Ant colony
optimization). Zoning is initially used to determine each character.
Then three distinct features (also called attributes) of each character
in each zone are extracted. The attributes are Head zone, End point,
and Feature code. All attributes are used for construct the
classification rules by an Ant-miner algorithm in order to classify
112 Thai characters. For this experiment, the Ant-miner algorithm is
adapted, with a small change to increase the recognition rate. The
result of this experiment is a 97% recognition rate of the training set
(11200 characters) and 82.7% recognition rate of unseen data test
(22400 characters).
Abstract: In this paper a data miner based on the learning
automata is proposed and is called LA-miner. The LA-miner extracts
classification rules from data sets automatically. The proposed
algorithm is established based on the function optimization using
learning automata. The experimental results on three benchmarks
indicate that the performance of the proposed LA-miner is
comparable with (sometimes better than) the Ant-miner (a data miner
algorithm based on the Ant Colony optimization algorithm) and CNZ
(a well-known data mining algorithm for classification).
Abstract: To explore pipelines is one of various bio-mimetic
robot applications. The robot may work in common buildings such as
between ceilings and ducts, in addition to complicated and massive
pipeline systems of large industrial plants. The bio-mimetic robot finds
any troubled area or malfunction and then reports its data. Importantly,
it can not only prepare for but also react to any abnormal routes in the
pipeline. The pipeline monitoring tasks require special types of mobile
robots. For an effective movement along a pipeline, the movement of
the robot will be similar to that of insects or crawling animals. During
its movement along the pipelines, a pipeline monitoring robot has an
important task of finding the shapes of the approaching path on the
pipes. In this paper we propose an effective solution to the pipeline
pattern recognition, based on the fuzzy classification rules for the
measured IR distance data.
Abstract: Bio-chips are used for experiments on genes and
contain various information such as genes, samples and so on. The
two-dimensional bio-chips, in which one axis represent genes and the
other represent samples, are widely being used these days. Instead of
experimenting with real genes which cost lots of money and much
time to get the results, bio-chips are being used for biological
experiments. And extracting data from the bio-chips with high
accuracy and finding out the patterns or useful information from such
data is very important. Bio-chip analysis systems extract data from
various kinds of bio-chips and mine the data in order to get useful
information. One of the commonly used methods to mine the data is
classification. The algorithm that is used to classify the data can be
various depending on the data types or number characteristics and so
on. Considering that bio-chip data is extremely large, an algorithm that
imitates the ecosystem such as the ant algorithm is suitable to use as an
algorithm for classification. This paper focuses on finding the
classification rules from the bio-chip data using the Ant Colony
algorithm which imitates the ecosystem. The developed system takes
in consideration the accuracy of the discovered rules when it applies it
to the bio-chip data in order to predict the classes.
Abstract: This paper details the application of a genetic
programming framework for induction of useful classification rules
from a database of income statements, balance sheets, and cash flow
statements for North American public companies. Potentially
interesting classification rules are discovered. Anomalies in the
discovery process merit further investigation of the application of
genetic programming to the dataset for the problem domain.
Abstract: In the era of great competition, understanding and satisfying
customers- requirements are the critical tasks for a company
to make a profits. Customer relationship management (CRM) thus
becomes an important business issue at present. With the help of
the data mining techniques, the manager can explore and analyze
from a large quantity of data to discover meaningful patterns and
rules. Among all methods, well-known association rule is most
commonly seen. This paper is based on Apriori algorithm and uses
genetic algorithms combining a data mining method to discover fuzzy
classification rules. The mined results can be applied in CRM to
help decision marker make correct business decisions for marketing
strategies.