Abstract: Despite the availability of natural disaster related time series data for last 110 years, there is no forecasting tool available to humanitarian relief organizations to determine forecasts for emergency logistics planning. This study develops a forecasting tool based on identifying probability of disaster for each country in the world by using decision tree modeling. Further, the determination of aggregate forecasts leads to efficient pre-disaster planning. Based on the research findings, the relief agencies can optimize the various resources allocation in emergency logistics planning.
Abstract: This paper suggests a decision tree based approach for flexible job shop scheduling with multiple process plans, i.e. each job can be processed through alternative operations, each of which can be processed on alternative machines. The main decision variables are: (a) selecting operation/machine pair; and (b) sequencing the jobs assigned to each machine. As an extension of the priority scheduling approach that selects the best priority rule combination after many simulation runs, this study suggests a decision tree based approach in which a decision tree is used to select a priority rule combination adequate for a specific system state and hence the burdens required for developing simulation models and carrying out simulation runs can be eliminated. The decision tree based scheduling approach consists of construction and scheduling modules. In the construction module, a decision tree is constructed using a four-stage algorithm, and in the scheduling module, a priority rule combination is selected using the decision tree. To show the performance of the decision tree based approach suggested in this study, a case study was done on a flexible job shop with reconfigurable manufacturing cells and a conventional job shop, and the results are reported by comparing it with individual priority rule combinations for the objectives of minimizing total flow time and total tardiness.
Abstract: This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.
Abstract: In this paper we propose a new classification method for automatic sleep scoring using an artificial neural network based decision tree. It attempts to treat sleep scoring progress as a series of two-class problems and solves them with a decision tree made up of a group of neural network classifiers, each of which uses a special feature set and is aimed at only one specific sleep stage in order to maximize the classification effect. A single electroencephalogram (EEG) signal is used for our analysis rather than depending on multiple biological signals, which makes greatly simplifies the data acquisition process. Experimental results demonstrate that the average epoch by epoch agreement between the visual and the proposed method in separating 30s wakefulness+S1, REM, S2 and SWS epochs was 88.83%. This study shows that the proposed method performed well in all the four stages, and can effectively limit error propagation at the same time. It could, therefore, be an efficient method for automatic sleep scoring. Additionally, since it requires only a small volume of data it could be suited to pervasive applications.
Abstract: As the number of networked computers grows,
intrusion detection is an essential component in keeping networks
secure. Various approaches for intrusion detection are currently
being in use with each one has its own merits and demerits. This
paper presents our work to test and improve the performance of a
new class of decision tree c-fuzzy decision tree to detect intrusion.
The work also includes identifying best candidate feature sub set to
build the efficient c-fuzzy decision tree based Intrusion Detection
System (IDS). We investigated the usefulness of c-fuzzy decision
tree for developing IDS with a data partition based on horizontal
fragmentation. Empirical results indicate the usefulness of our
approach in developing the efficient IDS.
Abstract: Data mining has been used very frequently to extract
hidden information from large databases. This paper suggests the use
of decision trees for continuously extracting the clinical reasoning in
the form of medical expert-s actions that is inherent in large number
of EMRs (Electronic Medical records). In this way the extracted data
could be used to teach students of oral medicine a number of orderly
processes for dealing with patients who represent with different
problems within the practice context over time.
Abstract: Decision tree algorithms have very important place at
classification model of data mining. In literature, algorithms use
entropy concept or gini index to form the tree. The shape of the
classes and their closeness to each other some of the factors that
affect the performance of the algorithm. In this paper we introduce a
new decision tree algorithm which employs data (attribute) folding
method and variation of the class variables over the branches to be
created. A comparative performance analysis has been held between
the proposed algorithm and C4.5.
Abstract: Mobile agents are a powerful approach to develop distributed systems since they migrate to hosts on which they have the resources to execute individual tasks. In a dynamic environment like a peer-to-peer network, Agents have to be generated frequently and dispatched to the network. Thus they will certainly consume a certain amount of bandwidth of each link in the network if there are too many agents migration through one or several links at the same time, they will introduce too much transferring overhead to the links eventually, these links will be busy and indirectly block the network traffic, therefore, there is a need of developing routing algorithms that consider about traffic load. In this paper we seek to create cooperation between a probabilistic manner according to the quality measure of the network traffic situation and the agent's migration decision making to the next hop based on decision tree learning algorithms.
Abstract: Combined therapy using Interferon and Ribavirin is the standard treatment in patients with chronic hepatitis C. However, the number of responders to this treatment is low, whereas its cost and side effects are high. Therefore, there is a clear need to predict patient’s response to the treatment based on clinical information to protect the patients from the bad drawbacks, Intolerable side effects and waste of money. Different machine learning techniques have been developed to fulfill this purpose. From these techniques are Associative Classification (AC) and Decision Tree (DT). The aim of this research is to compare the performance of these two techniques in the prediction of virological response to the standard treatment of HCV from clinical information. 200 patients treated with Interferon and Ribavirin; were analyzed using AC and DT. 150 cases had been used to train the classifiers and 50 cases had been used to test the classifiers. The experiment results showed that the two techniques had given acceptable results however the best accuracy for the AC reached 92% whereas for DT reached 80%.
Abstract: In the end of the day, meteorological data and environmental data becomes widely used such as plant varieties selection system. Variety plant selection for planted area is of almost importance for all crops, including varieties of sugarcane. Since sugarcane have many varieties. Variety plant non selection for planting may not be adapted to the climate or soil conditions for planted area. Poor growth, bloom drop, poor fruit, and low price are to be from varieties which were not recommended for those planted area. This paper presents plant varieties selection system for planted areas in Thailand from meteorological data and environmental data by the use of decision tree techniques. With this software developed as an environmental data analysis tool, it can analyze resulting easier and faster. Our software is a front end of WEKA that provides fundamental data mining functions such as classify, clustering, and analysis functions. It also supports pre-processing, analysis, and decision tree output with exporting result. After that, our software can export and display data result to Google maps API in order to display result and plot plant icons effectively.
Abstract: Until recently, researchers have developed various
tools and methodologies for effective clinical decision-making.
Among those decisions, chest pain diseases have been one of
important diagnostic issues especially in an emergency department. To
improve the ability of physicians in diagnosis, many researchers have
developed diagnosis intelligence by using machine learning and data
mining. However, most of the conventional methodologies have been
generally based on a single classifier for disease classification and
prediction, which shows moderate performance. This study utilizes an
ensemble strategy to combine multiple different classifiers to help
physicians diagnose chest pain diseases more accurately than ever.
Specifically the ensemble strategy is applied by using the integration
of decision trees, neural networks, and support vector machines. The
ensemble models are applied to real-world emergency data. This study
shows that the performance of the ensemble models is superior to each
of single classifiers.
Abstract: In this paper a new method is suggested for risk
management by the numerical patterns in data-mining. These patterns
are designed using probability rules in decision trees and are cared to
be valid, novel, useful and understandable. Considering a set of
functions, the system reaches to a good pattern or better objectives.
The patterns are analyzed through the produced matrices and some
results are pointed out. By using the suggested method the direction
of the functionality route in the systems can be controlled and best
planning for special objectives be done.
Abstract: Text Mining is around applying knowledge discovery
techniques to unstructured text is termed knowledge discovery in text
(KDT), or Text data mining or Text Mining. In decision tree
approach is most useful in classification problem. With this
technique, tree is constructed to model the classification process.
There are two basic steps in the technique: building the tree and
applying the tree to the database. This paper describes a proposed
C5.0 classifier that performs rulesets, cross validation and boosting
for original C5.0 in order to reduce the optimization of error ratio.
The feasibility and the benefits of the proposed approach are
demonstrated by means of medial data set like hypothyroid. It is
shown that, the performance of a classifier on the training cases from
which it was constructed gives a poor estimate by sampling or using a
separate test file, either way, the classifier is evaluated on cases that
were not used to build and evaluate the classifier are both are large. If
the cases in hypothyroid.data and hypothyroid.test were to be
shuffled and divided into a new 2772 case training set and a 1000
case test set, C5.0 might construct a different classifier with a lower
or higher error rate on the test cases. An important feature of see5 is
its ability to classifiers called rulesets. The ruleset has an error rate
0.5 % on the test cases. The standard errors of the means provide an
estimate of the variability of results. One way to get a more reliable
estimate of predictive is by f-fold –cross- validation. The error rate of
a classifier produced from all the cases is estimated as the ratio of the
total number of errors on the hold-out cases to the total number of
cases. The Boost option with x trials instructs See5 to construct up to
x classifiers in this manner. Trials over numerous datasets, large and
small, show that on average 10-classifier boosting reduces the error
rate for test cases by about 25%.
Abstract: Globalization, supported by information and
communication technologies, changes the rules of competitiveness
and increases the significance of information, knowledge and
network cooperation. In line with this trend, the need for efficient
trust-building tools has emerged. The absence of trust building
mechanisms and strategies was identified within several studies.
Through trust development, participation on e-business network and
usage of network services will increase and provide to SMEs new
economic benefits. This work is focused on effective trust building
strategies development for electronic business network platforms.
Based on trust building mechanism identification, the questionnairebased
analysis of its significance and minimum level of requirements
was conducted. In the paper, we are confirming the trust dependency
on e-Skills which play crucial role in higher level of trust into the
more sophisticated and complex trust building ICT solutions.
Abstract: This paper focuses on the data-driven generation
of fuzzy IF...THEN rules. The resulted fuzzy rule base can be
applied to build a classifier, a model used for prediction, or
it can be applied to form a decision support system. Among
the wide range of possible approaches, the decision tree and
the association rule based algorithms are overviewed, and two
new approaches are presented based on the a priori fuzzy
clustering based partitioning of the continuous input variables.
An application study is also presented, where the developed
methods are tested on the well known Wisconsin Breast Cancer
classification problem.
Abstract: In this paper, a data mining model to SMEs for detecting financial and operational risk indicators by data mining is presenting. The identification of the risk factors by clarifying the relationship between the variables defines the discovery of knowledge from the financial and operational variables. Automatic and estimation oriented information discovery process coincides the definition of data mining. During the formation of model; an easy to understand, easy to interpret and easy to apply utilitarian model that is far from the requirement of theoretical background is targeted by the discovery of the implicit relationships between the data and the identification of effect level of every factor. In addition, this paper is based on a project which was funded by The Scientific and Technological Research Council of Turkey (TUBITAK).
Abstract: Traffic incident has bad effect on all parts of society
so controlling road networks with enough traffic devices could help
to decrease number of accidents, so using the best method for
optimum site selection of these devices could help to implement good
monitoring system. This paper has considered here important criteria
for optimum site selection of traffic camera based on aggregation
methods such as Bagging and Dempster-Shafer concepts. In the first
step, important criteria such as annual traffic flow, distance from
critical places such as parks that need more traffic controlling were
identified for selection of important road links for traffic camera
installation, Then classification methods such as Artificial neural
network and Decision tree algorithms were employed for
classification of road links based on their importance for camera
installation. Then for improving the result of classifiers aggregation
methods such as Bagging and Dempster-Shafer theories were used.
Abstract: Competing risks survival data that comprises of more
than one type of event has been used in many applications, and one
of these is in clinical study (e.g. in breast cancer study). The
decision tree method can be extended to competing risks survival
data by modifying the split function so as to accommodate two or
more risks which might be dependent on each other. Recently,
researchers have constructed some decision trees for recurrent
survival time data using frailty and marginal modelling. We further
extended the method for the case of competing risks. In this paper,
we developed the decision tree method for competing risks survival
time data based on proportional hazards for subdistribution of
competing risks. In particular, we grow a tree by using deviance
statistic. The application of breast cancer data is presented. Finally,
to investigate the performance of the proposed method, simulation
studies on identification of true group of observations were executed.
Abstract: Performance of any continuous speech recognition system is highly dependent on performance of the acoustic models. Generally, development of the robust spoken language technology relies on the availability of large amounts of data. Common way to cope with little data for training each state of Markov models is treebased state tying. This tying method applies contextual questions to tie states. Manual procedure for question generation suffers from human errors and is time consuming. Various automatically generated questions are used to construct decision tree. There are three approaches to generate questions to construct HMMs based on decision tree. One approach is based on misrecognized phonemes, another approach basically uses feature table and the other is based on state distributions corresponding to context-independent subword units. In this paper, all these methods of automatic question generation are applied to the decision tree on FARSDAT corpus in Persian language and their results are compared with those of manually generated questions. The results show that automatically generated questions yield much better results and can replace manually generated questions in Persian language.
Abstract: Landslide susceptibility map delineates the potential
zones for landslide occurrence. Previous works have applied
multivariate methods and neural networks for mapping landslide
susceptibility. This study proposed a new approach to integrate
decision tree model and spatial cluster statistic for assessing landslide
susceptibility spatially. A total of 2057 landslide cells were digitized
for developing the landslide decision tree model. The relationships of
landslides and instability factors were explicitly represented by using
tree graphs in the model. The local Getis-Ord statistics were used to
cluster cells with high landslide probability. The analytic result from
the local Getis-Ord statistics was classed to create a map of landslide
susceptibility zones. The map was validated using new landslide data
with 482 cells. Results of validation show an accuracy rate of 86.1% in
predicting new landslide occurrence. This indicates that the proposed
approach is useful for improving landslide susceptibility mapping.