Abstract: Until recently, researchers have developed various
tools and methodologies for effective clinical decision-making.
Among those decisions, chest pain diseases have been one of
important diagnostic issues especially in an emergency department. To
improve the ability of physicians in diagnosis, many researchers have
developed diagnosis intelligence by using machine learning and data
mining. However, most of the conventional methodologies have been
generally based on a single classifier for disease classification and
prediction, which shows moderate performance. This study utilizes an
ensemble strategy to combine multiple different classifiers to help
physicians diagnose chest pain diseases more accurately than ever.
Specifically the ensemble strategy is applied by using the integration
of decision trees, neural networks, and support vector machines. The
ensemble models are applied to real-world emergency data. This study
shows that the performance of the ensemble models is superior to each
of single classifiers.
Abstract: Association rules are an important problem in data
mining. Massively increasing volume of data in real life databases
has motivated researchers to design novel and incremental algorithms
for association rules mining. In this paper, we propose an incremental
association rules mining algorithm that integrates shocking
interestingness criterion during the process of building the model. A
new interesting measure called shocking measure is introduced. One
of the main features of the proposed approach is to capture the user
background knowledge, which is monotonically augmented. The
incremental model that reflects the changing data and the user beliefs
is attractive in order to make the over all KDD process more
effective and efficient. We implemented the proposed approach and
experiment it with some public datasets and found the results quite
promising.
Abstract: Application of Information Technology (IT) has
revolutionized the functioning of business all over the world. Its
impact has been felt mostly among the information of dependent
industries. Tourism is one of such industry. The conceptual
framework in this study represents an innovation of travel
information searching system on mobile devices which is used as
tools to deliver travel information (such as hotels, restaurants, tourist
attractions and souvenir shops) for each user by travelers
segmentation based on data mining technique to segment the tourists-
behavior patterns then match them with tourism products and
services. This system innovation is designed to be a knowledge
incremental learning. It is a marketing strategy to support business to
respond traveler-s demand effectively.
Abstract: This article concerns the presentation of an integrated
method for detection of steganographic content embedded by new
unknown programs. The method is based on data mining and
aggregated hypothesis testing. The article contains the theoretical
basics used to deploy the proposed detection system and the
description of improvement proposed for the basic system idea.
Further main results of experiments and implementation details are
collected and described. Finally example results of the tests are
presented.
Abstract: The aim of this paper is to present a methodology in
three steps to forecast supply chain demand. In first step, various data
mining techniques are applied in order to prepare data for entering
into forecasting models. In second step, the modeling step, an
artificial neural network and support vector machine is presented
after defining Mean Absolute Percentage Error index for measuring
error. The structure of artificial neural network is selected based on
previous researchers' results and in this article the accuracy of
network is increased by using sensitivity analysis. The best forecast
for classical forecasting methods (Moving Average, Exponential
Smoothing, and Exponential Smoothing with Trend) is resulted based
on prepared data and this forecast is compared with result of support
vector machine and proposed artificial neural network. The results
show that artificial neural network can forecast more precisely in
comparison with other methods. Finally, forecasting methods'
stability is analyzed by using raw data and even the effectiveness of
clustering analysis is measured.
Abstract: Self-organizing map (SOM) is a well known data
reduction technique used in data mining. It can reveal structure in
data sets through data visualization that is otherwise hard to detect
from raw data alone. However, interpretation through visual
inspection is prone to errors and can be very tedious. There are
several techniques for the automatic detection of clusters of code
vectors found by SOM, but they generally do not take into account
the distribution of code vectors; this may lead to unsatisfactory
clustering and poor definition of cluster boundaries, particularly
where the density of data points is low. In this paper, we propose the
use of an adaptive heuristic particle swarm optimization (PSO)
algorithm for finding cluster boundaries directly from the code
vectors obtained from SOM. The application of our method to
several standard data sets demonstrates its feasibility. PSO algorithm
utilizes a so-called U-matrix of SOM to determine cluster boundaries;
the results of this novel automatic method compare very favorably to
boundary detection through traditional algorithms namely k-means
and hierarchical based approach which are normally used to interpret
the output of SOM.
Abstract: This paper discusses the use of explorative data
mining tools that allow the educator to explore new relationships
between reported learning experiences and actual activities,
even if there are multiple dimensions with a large number
of measured items. The underlying technology is based on
the so-called Compendium Platform for Reproducible Computing
(http://www.freestatistics.org) which was built on top the computational
R Framework (http://www.wessa.net).
Abstract: Phishing, or stealing of sensitive information on the
web, has dealt a major blow to Internet Security in recent times. Most
of the existing anti-phishing solutions fail to handle the fuzziness
involved in phish detection, thus leading to a large number of false
positives. This fuzziness is attributed to the use of highly flexible and
at the same time, highly ambiguous HTML language. We introduce a
new perspective against phishing, that tries to systematically prove,
whether a given page is phished or not, using the corresponding
original page as the basis of the comparison. It analyzes the layout of
the pages under consideration to determine the percentage distortion
between them, indicative of any form of malicious alteration. The
system design represents an intelligent system, employing dynamic
assessment which accurately identifies brand new phishing attacks
and will prove effective in reducing the number of false positives.
This framework could potentially be used as a knowledge base, in
educating the internet users against phishing.
Abstract: The paper discusses the mathematics of pattern
indexing and its applications to recognition of visual patterns that are
found in video clips. It is shown that (a) pattern indexes can be
represented by collections of inverted patterns, (b) solutions to
pattern classification problems can be found as intersections and
histograms of inverted patterns and, thus, matching of original
patterns avoided.
Abstract: Diabetes is one of the high prevalence diseases
worldwide with increased number of complications, with retinopathy
as one of the most common one. This paper describes how data
mining and case-based reasoning were integrated to predict
retinopathy prevalence among diabetes patients in Malaysia. The
knowledge base required was built after literature reviews and
interviews with medical experts. A total of 140 diabetes patients- data
were used to train the prediction system. A voting mechanism selects
the best prediction results from the two techniques used. It has been
successfully proven that both data mining and case-based reasoning
can be used for retinopathy prediction with an improved accuracy of
85%.
Abstract: Property investment in the real estate industry has a
high risk due to the uncertainty factors that will affect the decisions
made and high cost. Analytic hierarchy process has existed for some
time in which referred to an expert-s opinion to measure the
uncertainty of the risk factors for the risk analysis. Therefore,
different level of experts- experiences will create different opinion
and lead to the conflict among the experts in the field. The objective
of this paper is to propose a new technique to measure the uncertainty
of the risk factors based on multidimensional data model and data
mining techniques as deterministic approach. The propose technique
consist of a basic framework which includes four modules: user,
technology, end-user access tools and applications. The property
investment risk analysis defines as a micro level analysis as the
features of the property will be considered in the analysis in this
paper.
Abstract: A data cutting and sorting method (DCSM) is proposed
to optimize the performance of data mining. DCSM reduces the
calculation time by getting rid of redundant data during the data
mining process. In addition, DCSM minimizes the computational units
by splitting the database and by sorting data with support counts. In the
process of searching for the relationship between metabolic syndrome
and lifestyles with the health examination database of an electronics
manufacturing company, DCSM demonstrates higher search
efficiency than the traditional Apriori algorithm in tests with different
support counts.