Abstract: The healthcare environment is generally perceived as
being information rich yet knowledge poor. However, there is a lack
of effective analysis tools to discover hidden relationships and trends
in data. In fact, valuable knowledge can be discovered from
application of data mining techniques in healthcare system. In this
study, a proficient methodology for the extraction of significant
patterns from the Coronary Heart Disease warehouses for heart
attack prediction, which unfortunately continues to be a leading cause
of mortality in the whole world, has been presented. For this purpose,
we propose to enumerate dynamically the optimal subsets of the
reduced features of high interest by using rough sets technique
associated to dynamic programming. Therefore, we propose to
validate the classification using Random Forest (RF) decision tree to
identify the risky heart disease cases. This work is based on a large
amount of data collected from several clinical institutions based on
the medical profile of patient. Moreover, the experts- knowledge in
this field has been taken into consideration in order to define the
disease, its risk factors, and to establish significant knowledge
relationships among the medical factors. A computer-aided system is
developed for this purpose based on a population of 525 adults. The
performance of the proposed model is analyzed and evaluated based
on set of benchmark techniques applied in this classification problem.
Abstract: Text Mining is an important step of Knowledge
Discovery process. It is used to extract hidden information from notstructured
o semi-structured data. This aspect is fundamental because
much of the Web information is semi-structured due to the nested
structure of HTML code, much of the Web information is linked,
much of the Web information is redundant. Web Text Mining helps
whole knowledge mining process to mining, extraction and
integration of useful data, information and knowledge from Web
page contents.
In this paper, we present a Web Text Mining process able to
discover knowledge in a distributed and heterogeneous multiorganization
environment. The Web Text Mining process is based on
flexible architecture and is implemented by four steps able to
examine web content and to extract useful hidden information
through mining techniques. Our Web Text Mining prototype starts
from the recovery of Web job offers in which, through a Text Mining
process, useful information for fast classification of the same are
drawn out, these information are, essentially, job offer place and
skills.
Abstract: DNA microarrays allow the measurement of expression levels for a large number of genes, perhaps all genes of an organism, within a number of different experimental samples. It is very much important to extract biologically meaningful information from this huge amount of expression data to know the current state of the cell because most cellular processes are regulated by changes in gene expression. Association rule mining techniques are helpful to find association relationship between genes. Numerous association rule mining algorithms have been developed to analyze and associate this huge amount of gene expression data. This paper focuses on some of the popular association rule mining algorithms developed to analyze gene expression data.
Abstract: It is important to predict yield in semiconductor test process in order to increase yield. In this study, yield prediction means finding out defective die, wafer or lot effectively. Semiconductor test process consists of some test steps and each test includes various test items. In other world, test data has a big and complicated characteristic. It also is disproportionably distributed as the number of data belonging to FAIL class is extremely low. For yield prediction, general data mining techniques have a limitation without any data preprocessing due to eigen properties of test data. Therefore, this study proposes an under-sampling method using support vector machine (SVM) to eliminate an imbalanced characteristic. For evaluating a performance, randomly under-sampling method is compared with the proposed method using actual semiconductor test data. As a result, sampling method using SVM is effective in generating robust model for yield prediction.
Abstract: In this paper we use data mining techniques to investigate factors that contribute significantly to enhancing the risk of acute coronary syndrome. We assume that the dependent variable is diagnosis – with dichotomous values showing presence or absence of disease. We have applied binary regression to the factors affecting the dependent variable. The data set has been taken from two different cardiac hospitals of Karachi, Pakistan. We have total sixteen variables out of which one is assumed dependent and other 15 are independent variables. For better performance of the regression model in predicting acute coronary syndrome, data reduction techniques like principle component analysis is applied. Based on results of data reduction, we have considered only 14 out of sixteen factors.
Abstract: The purpose of determining impact significance is to
place value on impacts. Environmental impact assessment review is a
process that judges whether impact significance is acceptable or not in
accordance with the scientific facts regarding environmental,
ecological and socio-economical impacts described in environmental
impact statements (EIS) or environmental impact assessment reports
(EIAR). The first aim of this paper is to summarize the criteria of
significance evaluation from the past review results and accordingly
utilize fuzzy logic to incorporate these criteria into scientific facts. The
second aim is to employ data mining technique to construct an EIS or
EIAR prediction model for reviewing results which can assist
developers to prepare and revise better environmental management
plans in advance. The validity of the previous prediction model
proposed by authors in 2009 is 92.7%. The enhanced validity in this
study can attain 100.0%.
Abstract: Recommender systems are usually regarded as an
important marketing tool in the e-commerce. They use important
information about users to facilitate accurate recommendation. The
information includes user context such as location, time and interest
for personalization of mobile users. We can easily collect information
about location and time because mobile devices communicate with the
base station of the service provider. However, information about user
interest can-t be easily collected because user interest can not be
captured automatically without user-s approval process. User interest
usually represented as a need. In this study, we classify needs into two
types according to prior research. This study investigates the
usefulness of data mining techniques for classifying user need type for
recommendation systems. We employ several data mining techniques
including artificial neural networks, decision trees, case-based
reasoning, and multivariate discriminant analysis. Experimental
results show that CHAID algorithm outperforms other models for
classifying user need type. This study performs McNemar test to
examine the statistical significance of the differences of classification
results. The results of McNemar test also show that CHAID performs
better than the other models with statistical significance.
Abstract: The aim of this paper is to present a methodology in
three steps to forecast supply chain demand. In first step, various data
mining techniques are applied in order to prepare data for entering
into forecasting models. In second step, the modeling step, an
artificial neural network and support vector machine is presented
after defining Mean Absolute Percentage Error index for measuring
error. The structure of artificial neural network is selected based on
previous researchers' results and in this article the accuracy of
network is increased by using sensitivity analysis. The best forecast
for classical forecasting methods (Moving Average, Exponential
Smoothing, and Exponential Smoothing with Trend) is resulted based
on prepared data and this forecast is compared with result of support
vector machine and proposed artificial neural network. The results
show that artificial neural network can forecast more precisely in
comparison with other methods. Finally, forecasting methods'
stability is analyzed by using raw data and even the effectiveness of
clustering analysis is measured.
Abstract: Diagnosis can be achieved by building a model of a
certain organ under surveillance and comparing it with the real time
physiological measurements taken from the patient. This paper deals
with the presentation of the benefits of using Data Mining techniques
in the computer-aided diagnosis (CAD), focusing on the cancer
detection, in order to help doctors to make optimal decisions quickly
and accurately. In the field of the noninvasive diagnosis techniques,
the endoscopic ultrasound elastography (EUSE) is a recent elasticity
imaging technique, allowing characterizing the difference between
malignant and benign tumors. Digitalizing and summarizing the main
EUSE sample movies features in a vector form concern with the use
of the exploratory data analysis (EDA). Neural networks are then
trained on the corresponding EUSE sample movies vector input in
such a way that these intelligent systems are able to offer a very
precise and objective diagnosis, discriminating between benign and
malignant tumors. A concrete application of these Data Mining
techniques illustrates the suitability and the reliability of this
methodology in CAD.
Abstract: This research aims to create a model for analysis of student motivation behavior on e-Learning based on association rule mining techniques in case of the Information Technology for Communication and Learning Course at Suan Sunandha Rajabhat University. The model was created under association rules, one of the data mining techniques with minimum confidence. The results showed that the student motivation behavior model by using association rule technique can indicate the important variables that influence the student motivation behavior on e-Learning.
Abstract: Data Mining aims at discovering knowledge out of
data and presenting it in a form that is easily comprehensible to
humans. One of the useful applications in Egypt is the Cancer
management, especially the management of Acute Lymphoblastic
Leukemia or ALL, which is the most common type of cancer in
children.
This paper discusses the process of designing a prototype that can
help in the management of childhood ALL, which has a great
significance in the health care field. Besides, it has a social impact
on decreasing the rate of infection in children in Egypt. It also
provides valubale information about the distribution and
segmentation of ALL in Egypt, which may be linked to the possible
risk factors.
Undirected Knowledge Discovery is used since, in the case of this
research project, there is no target field as the data provided is
mainly subjective. This is done in order to quantify the subjective
variables. Therefore, the computer will be asked to identify
significant patterns in the provided medical data about ALL. This
may be achieved through collecting the data necessary for the
system, determimng the data mining technique to be used for the
system, and choosing the most suitable implementation tool for the
domain.
The research makes use of a data mining tool, Clementine, so as to
apply Decision Trees technique. We feed it with data extracted from
real-life cases taken from specialized Cancer Institutes. Relevant
medical cases details such as patient medical history and diagnosis
are analyzed, classified, and clustered in order to improve the disease
management.
Abstract: Hemodialysis patients might suffer from unhealthy
care behaviors or long-term dialysis treatments. Ultimately they need
to be hospitalized. If the hospitalization rate of a hemodialysis center
is high, its quality of service would be low. Therefore, how to decrease
hospitalization rate is a crucial problem for health care. In this study
we combined temporal abstraction with data mining techniques for
analyzing the dialysis patients' biochemical data to develop a decision
support system. The mined temporal patterns are helpful for clinicians
to predict hospitalization of hemodialysis patients and to suggest them
some treatments immediately to avoid hospitalization.
Abstract: This paper presents a system for discovering
association rules from collections of unstructured documents called
EART (Extract Association Rules from Text). The EART system
treats texts only not images or figures. EART discovers association
rules amongst keywords labeling the collection of textual documents.
The main characteristic of EART is that the system integrates XML
technology (to transform unstructured documents into structured
documents) with Information Retrieval scheme (TF-IDF) and Data
Mining technique for association rules extraction. EART depends on
word feature to extract association rules. It consists of four phases:
structure phase, index phase, text mining phase and visualization
phase. Our work depends on the analysis of the keywords in the
extracted association rules through the co-occurrence of the keywords
in one sentence in the original text and the existing of the keywords
in one sentence without co-occurrence. Experiments applied on a
collection of scientific documents selected from MEDLINE that are
related to the outbreak of H5N1 avian influenza virus.
Abstract: This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing on the words and their statistical distributions in documents. The main contributions of the technique are that it integrates XML technology with Information Retrieval scheme (TFIDF) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) and use Data Mining technique for association rules discovery. It consists of three phases: Text Preprocessing phase (transformation, filtration, stemming and indexing of the documents), Association Rule Mining (ARM) phase (applying our designed algorithm for Generating Association Rules based on Weighting scheme GARW) and Visualization phase (visualization of results). Experiments applied on WebPages news documents related to the outbreak of the bird flu disease. The extracted association rules contain important features and describe the informative news included in the documents collection. The performance of the EART system compared with another system that uses the Apriori algorithm throughout the execution time and evaluating extracted association rules.
Abstract: This paper is a description approach to predict
incoming and outgoing data rate in network system by using
association rule discover, which is one of the data mining
techniques. Information of incoming and outgoing data in each
times and network bandwidth are network performance
parameters, which needed to solve in the traffic problem. Since
congestion and data loss are important network problems. The result
of this technique can predicted future network traffic. In addition,
this research is useful for network routing selection and network
performance improvement.
Abstract: MATCH project [1] entitle the development of an
automatic diagnosis system that aims to support treatment of colon
cancer diseases by discovering mutations that occurs to tumour
suppressor genes (TSGs) and contributes to the development of
cancerous tumours. The constitution of the system is based on a)
colon cancer clinical data and b) biological information that will be
derived by data mining techniques from genomic and proteomic
sources The core mining module will consist of the popular, well
tested hybrid feature extraction methods, and new combined
algorithms, designed especially for the project. Elements of rough
sets, evolutionary computing, cluster analysis, self-organization maps
and association rules will be used to discover the annotations
between genes, and their influence on tumours [2]-[11].
The methods used to process the data have to address their high
complexity, potential inconsistency and problems of dealing with the
missing values. They must integrate all the useful information
necessary to solve the expert's question. For this purpose, the system
has to learn from data, or be able to interactively specify by a domain
specialist, the part of the knowledge structure it needs to answer a
given query. The program should also take into account the
importance/rank of the particular parts of data it analyses, and adjusts
the used algorithms accordingly.
Abstract: Estimation time and cost of work completion in a
project and follow up them during execution are contributors to
success or fail of a project, and is very important for project
management team. Delivering on time and within budgeted cost
needs to well managing and controlling the projects. To dealing with
complex task of controlling and modifying the baseline project
schedule during execution, earned value management systems have
been set up and widely used to measure and communicate the real
physical progress of a project. But it often fails to predict the total
duration of the project. In this paper data mining techniques is used
predicting the total project duration in term of Time Estimate At
Completion-EAC (t). For this purpose, we have used a project with
90 activities, it has updated day by day. Then, it is used regular
indexes in literature and applied Earned Duration Method to
calculate time estimate at completion and set these as input data for
prediction and specifying the major parameters among them using
Clem software. By using data mining, the effective parameters on
EAC and the relationship between them could be extracted and it is
very useful to manage a project with minimum delay risks. As we
state, this could be a simple, safe and applicable method in prediction
the completion time of a project during execution.
Abstract: In the era of great competition, understanding and satisfying
customers- requirements are the critical tasks for a company
to make a profits. Customer relationship management (CRM) thus
becomes an important business issue at present. With the help of
the data mining techniques, the manager can explore and analyze
from a large quantity of data to discover meaningful patterns and
rules. Among all methods, well-known association rule is most
commonly seen. This paper is based on Apriori algorithm and uses
genetic algorithms combining a data mining method to discover fuzzy
classification rules. The mined results can be applied in CRM to
help decision marker make correct business decisions for marketing
strategies.
Abstract: In this paper we used data mining techniques to
identify outlier patients who are using large amount of drugs over a
long period of time. Any healthcare or health insurance system
should deal with the quantities of drugs utilized by chronic diseases
patients. In Kingdom of Bahrain, about 20% of health budget is spent
on medications. For the managers of healthcare systems, there is no
enough information about the ways of drug utilization by chronic
diseases patients, is there any misuse or is there outliers patients. In
this work, which has been done in cooperation with information
department in the Bahrain Defence Force hospital; we select the data
for Cardiac patients in the period starting from 1/1/2008 to
December 31/12/2008 to be the data for the model in this paper. We
used three techniques for finding the drug utilization for cardiac
patients. First we applied a clustering technique, followed by
measuring of clustering validity, and finally we applied a decision
tree as classification algorithm. The clustering results is divided into
three clusters according to the drug utilization, for 1603 patients, who
received 15,806 prescriptions during this period can be partitioned
into three groups, where 23 patients (2.59%) who received 1316
prescriptions (8.32%) are classified to be outliers. The classification
algorithm shows that the use of average drug utilization and the age,
and the gender of the patient can be considered to be the main
predictive factors in the induced model.
Abstract: Currently, web usage make a huge data from a lot of
user attention. In general, proxy server is a system to support web
usage from user and can manage system by using hit rates. This
research tries to improve hit rates in proxy system by applying data
mining technique. The data set are collected from proxy servers in the
university and are investigated relationship based on several features.
The model is used to predict the future access websites. Association
rule technique is applied to get the relation among Date, Time, Main
Group web, Sub Group web, and Domain name for created model.
The results showed that this technique can predict web content for the
next day, moreover the future accesses of websites increased from
38.15% to 85.57 %.
This model can predict web page access which tends to increase
the efficient of proxy servers as a result. In additional, the
performance of internet access will be improved and help to reduce
traffic in networks.