Abstract: In this study, the clients who applied to a bank branch for loan were analyzed through data mining. The study was composed of the information such as amounts of loans received by personal and SME clients working with the bank branch, installment numbers, number of delays in loan installments, payments available in other banks and number of banks to which they are in debt between 2010 and 2013. The client risk profile was examined through Classification and Regression Tree (CART) analysis, one of the decision tree classification methods. At the end of the study, 5 different types of customers have been determined on the decision tree. The classification of these types of customers has been created with the rating of those posing a risk for the bank branch and the customers have been classified according to the risk ratings.
Abstract: Classification is an important data mining technique
and could be used as data filtering in artificial intelligence. The
broad application of classification for all kind of data leads to be
used in nearly every field of our modern life. Classification helps us
to put together different items according to the feature items decided
as interesting and useful. In this paper, we compare two
classification methods Naïve Bayes and ADTree use to detect spam
e-mail. This choice is motivated by the fact that Naive Bayes
algorithm is based on probability calculus while ADTree algorithm is
based on decision tree. The parameter settings of the above
classifiers use the maximization of true positive rate and
minimization of false positive rate. The experiment results present
classification accuracy and cost analysis in view of optimal classifier
choice for Spam Detection. It is point out the number of attributes to
obtain a tradeoff between number of them and the classification
accuracy.
Abstract: Feature selection has been used in many fields such as
classification, data mining and object recognition and proven to be
effective for removing irrelevant and redundant features from the
original dataset. In this paper, a new design of distributed intrusion
detection system using a combination feature selection model based
on bees and decision tree. Bees algorithm is used as the search
strategy to find the optimal subset of features, whereas decision tree
is used as a judgment for the selected features. Both the produced
features and the generated rules are used by Decision Making Mobile
Agent to decide whether there is an attack or not in the networks.
Decision Making Mobile Agent will migrate through the networks,
moving from node to another, if it found that there is an attack on one
of the nodes, it then alerts the user through User Interface Agent or
takes some action through Action Mobile Agent. The KDD Cup 99
dataset is used to test the effectiveness of the proposed system. The
results show that even if only four features are used, the proposed
system gives a better performance when it is compared with the
obtained results using all 41 features.
Abstract: This paper suggests a decision tree based approach for flexible job shop scheduling with multiple process plans, i.e. each job can be processed through alternative operations, each of which can be processed on alternative machines. The main decision variables are: (a) selecting operation/machine pair; and (b) sequencing the jobs assigned to each machine. As an extension of the priority scheduling approach that selects the best priority rule combination after many simulation runs, this study suggests a decision tree based approach in which a decision tree is used to select a priority rule combination adequate for a specific system state and hence the burdens required for developing simulation models and carrying out simulation runs can be eliminated. The decision tree based scheduling approach consists of construction and scheduling modules. In the construction module, a decision tree is constructed using a four-stage algorithm, and in the scheduling module, a priority rule combination is selected using the decision tree. To show the performance of the decision tree based approach suggested in this study, a case study was done on a flexible job shop with reconfigurable manufacturing cells and a conventional job shop, and the results are reported by comparing it with individual priority rule combinations for the objectives of minimizing total flow time and total tardiness.
Abstract: In this paper we propose a new classification method for automatic sleep scoring using an artificial neural network based decision tree. It attempts to treat sleep scoring progress as a series of two-class problems and solves them with a decision tree made up of a group of neural network classifiers, each of which uses a special feature set and is aimed at only one specific sleep stage in order to maximize the classification effect. A single electroencephalogram (EEG) signal is used for our analysis rather than depending on multiple biological signals, which makes greatly simplifies the data acquisition process. Experimental results demonstrate that the average epoch by epoch agreement between the visual and the proposed method in separating 30s wakefulness+S1, REM, S2 and SWS epochs was 88.83%. This study shows that the proposed method performed well in all the four stages, and can effectively limit error propagation at the same time. It could, therefore, be an efficient method for automatic sleep scoring. Additionally, since it requires only a small volume of data it could be suited to pervasive applications.
Abstract: Performance of any continuous speech recognition system is highly dependent on performance of the acoustic models. Generally, development of the robust spoken language technology relies on the availability of large amounts of data. Common way to cope with little data for training each state of Markov models is treebased state tying. This tying method applies contextual questions to tie states. Manual procedure for question generation suffers from human errors and is time consuming. Various automatically generated questions are used to construct decision tree. There are three approaches to generate questions to construct HMMs based on decision tree. One approach is based on misrecognized phonemes, another approach basically uses feature table and the other is based on state distributions corresponding to context-independent subword units. In this paper, all these methods of automatic question generation are applied to the decision tree on FARSDAT corpus in Persian language and their results are compared with those of manually generated questions. The results show that automatically generated questions yield much better results and can replace manually generated questions in Persian language.
Abstract: The drug discovery process starts with protein
identification because proteins are responsible for many functions
required for maintenance of life. Protein identification further needs
determination of protein function. Proposed method develops a
classifier for human protein function prediction. The model uses
decision tree for classification process. The protein function is
predicted on the basis of matched sequence derived features per each
protein function. The research work includes the development of a
tool which determines sequence derived features by analyzing
different parameters. The other sequence derived features are
determined using various web based tools.
Abstract: Keystroke authentication is a new access control system
to identify legitimate users via their typing behavior. In this paper,
machine learning techniques are adapted for keystroke authentication.
Seven learning methods are used to build models to differentiate user
keystroke patterns. The selected classification methods are Decision
Tree, Naive Bayesian, Instance Based Learning, Decision Table, One
Rule, Random Tree and K-star. Among these methods, three of them
are studied in more details. The results show that machine learning
is a feasible alternative for keystroke authentication. Compared to
the conventional Nearest Neighbour method in the recent research,
learning methods especially Decision Tree can be more accurate. In
addition, the experiment results reveal that 3-Grams is more accurate
than 2-Grams and 4-Grams for feature extraction. Also, combination
of attributes tend to result higher accuracy.
Abstract: Feature selection study is gaining importance due to its contribution to save classification cost in terms of time and computation load. In search of essential features, one of the methods to search the features is via the decision tree. Decision tree act as an intermediate feature space inducer in order to choose essential features. In decision tree-based feature selection, some studies used decision tree as a feature ranker with a direct threshold measure, while others remain the decision tree but utilized pruning condition that act as a threshold mechanism to choose features. This paper proposed threshold measure using Manhattan Hierarchical Cluster distance to be utilized in feature ranking in order to choose relevant features as part of the feature selection process. The result is promising, and this method can be improved in the future by including test cases of a higher number of attributes.
Abstract: In this paper, we present a new learning algorithm for
anomaly based network intrusion detection using improved self
adaptive naïve Bayesian tree (NBTree), which induces a hybrid of
decision tree and naïve Bayesian classifier. The proposed approach
scales up the balance detections for different attack types and keeps
the false positives at acceptable level in intrusion detection. In
complex and dynamic large intrusion detection dataset, the detection
accuracy of naïve Bayesian classifier does not scale up as well as
decision tree. It has been successfully tested in other problem
domains that naïve Bayesian tree improves the classification rates in
large dataset. In naïve Bayesian tree nodes contain and split as
regular decision-trees, but the leaves contain naïve Bayesian
classifiers. The experimental results on KDD99 benchmark network
intrusion detection dataset demonstrate that this new approach scales
up the detection rates for different attack types and reduces false
positives in network intrusion detection.
Abstract: In recent years, a number of works proposing the
combination of multiple classifiers to produce a single
classification have been reported in remote sensing literature. The
resulting classifier, referred to as an ensemble classifier, is
generally found to be more accurate than any of the individual
classifiers making up the ensemble. As accuracy is the primary
concern, much of the research in the field of land cover
classification is focused on improving classification accuracy. This
study compares the performance of four ensemble approaches
(boosting, bagging, DECORATE and random subspace) with a
univariate decision tree as base classifier. Two training datasets,
one without ant noise and other with 20 percent noise was used to
judge the performance of different ensemble approaches. Results
with noise free data set suggest an improvement of about 4% in
classification accuracy with all ensemble approaches in
comparison to the results provided by univariate decision tree
classifier. Highest classification accuracy of 87.43% was achieved
by boosted decision tree. A comparison of results with noisy data
set suggests that bagging, DECORATE and random subspace
approaches works well with this data whereas the performance of
boosted decision tree degrades and a classification accuracy of
79.7% is achieved which is even lower than that is achieved (i.e.
80.02%) by using unboosted decision tree classifier.