Abstract: It is important to predict yield in semiconductor test process in order to increase yield. In this study, yield prediction means finding out defective die, wafer or lot effectively. Semiconductor test process consists of some test steps and each test includes various test items. In other world, test data has a big and complicated characteristic. It also is disproportionably distributed as the number of data belonging to FAIL class is extremely low. For yield prediction, general data mining techniques have a limitation without any data preprocessing due to eigen properties of test data. Therefore, this study proposes an under-sampling method using support vector machine (SVM) to eliminate an imbalanced characteristic. For evaluating a performance, randomly under-sampling method is compared with the proposed method using actual semiconductor test data. As a result, sampling method using SVM is effective in generating robust model for yield prediction.
Abstract: In this paper we use data mining techniques to investigate factors that contribute significantly to enhancing the risk of acute coronary syndrome. We assume that the dependent variable is diagnosis – with dichotomous values showing presence or absence of disease. We have applied binary regression to the factors affecting the dependent variable. The data set has been taken from two different cardiac hospitals of Karachi, Pakistan. We have total sixteen variables out of which one is assumed dependent and other 15 are independent variables. For better performance of the regression model in predicting acute coronary syndrome, data reduction techniques like principle component analysis is applied. Based on results of data reduction, we have considered only 14 out of sixteen factors.
Abstract: The purpose of determining impact significance is to
place value on impacts. Environmental impact assessment review is a
process that judges whether impact significance is acceptable or not in
accordance with the scientific facts regarding environmental,
ecological and socio-economical impacts described in environmental
impact statements (EIS) or environmental impact assessment reports
(EIAR). The first aim of this paper is to summarize the criteria of
significance evaluation from the past review results and accordingly
utilize fuzzy logic to incorporate these criteria into scientific facts. The
second aim is to employ data mining technique to construct an EIS or
EIAR prediction model for reviewing results which can assist
developers to prepare and revise better environmental management
plans in advance. The validity of the previous prediction model
proposed by authors in 2009 is 92.7%. The enhanced validity in this
study can attain 100.0%.
Abstract: Recommender systems are usually regarded as an
important marketing tool in the e-commerce. They use important
information about users to facilitate accurate recommendation. The
information includes user context such as location, time and interest
for personalization of mobile users. We can easily collect information
about location and time because mobile devices communicate with the
base station of the service provider. However, information about user
interest can-t be easily collected because user interest can not be
captured automatically without user-s approval process. User interest
usually represented as a need. In this study, we classify needs into two
types according to prior research. This study investigates the
usefulness of data mining techniques for classifying user need type for
recommendation systems. We employ several data mining techniques
including artificial neural networks, decision trees, case-based
reasoning, and multivariate discriminant analysis. Experimental
results show that CHAID algorithm outperforms other models for
classifying user need type. This study performs McNemar test to
examine the statistical significance of the differences of classification
results. The results of McNemar test also show that CHAID performs
better than the other models with statistical significance.
Abstract: Diagnosis can be achieved by building a model of a
certain organ under surveillance and comparing it with the real time
physiological measurements taken from the patient. This paper deals
with the presentation of the benefits of using Data Mining techniques
in the computer-aided diagnosis (CAD), focusing on the cancer
detection, in order to help doctors to make optimal decisions quickly
and accurately. In the field of the noninvasive diagnosis techniques,
the endoscopic ultrasound elastography (EUSE) is a recent elasticity
imaging technique, allowing characterizing the difference between
malignant and benign tumors. Digitalizing and summarizing the main
EUSE sample movies features in a vector form concern with the use
of the exploratory data analysis (EDA). Neural networks are then
trained on the corresponding EUSE sample movies vector input in
such a way that these intelligent systems are able to offer a very
precise and objective diagnosis, discriminating between benign and
malignant tumors. A concrete application of these Data Mining
techniques illustrates the suitability and the reliability of this
methodology in CAD.
Abstract: This research aims to create a model for analysis of student motivation behavior on e-Learning based on association rule mining techniques in case of the Information Technology for Communication and Learning Course at Suan Sunandha Rajabhat University. The model was created under association rules, one of the data mining techniques with minimum confidence. The results showed that the student motivation behavior model by using association rule technique can indicate the important variables that influence the student motivation behavior on e-Learning.
Abstract: Data Mining aims at discovering knowledge out of
data and presenting it in a form that is easily comprehensible to
humans. One of the useful applications in Egypt is the Cancer
management, especially the management of Acute Lymphoblastic
Leukemia or ALL, which is the most common type of cancer in
children.
This paper discusses the process of designing a prototype that can
help in the management of childhood ALL, which has a great
significance in the health care field. Besides, it has a social impact
on decreasing the rate of infection in children in Egypt. It also
provides valubale information about the distribution and
segmentation of ALL in Egypt, which may be linked to the possible
risk factors.
Undirected Knowledge Discovery is used since, in the case of this
research project, there is no target field as the data provided is
mainly subjective. This is done in order to quantify the subjective
variables. Therefore, the computer will be asked to identify
significant patterns in the provided medical data about ALL. This
may be achieved through collecting the data necessary for the
system, determimng the data mining technique to be used for the
system, and choosing the most suitable implementation tool for the
domain.
The research makes use of a data mining tool, Clementine, so as to
apply Decision Trees technique. We feed it with data extracted from
real-life cases taken from specialized Cancer Institutes. Relevant
medical cases details such as patient medical history and diagnosis
are analyzed, classified, and clustered in order to improve the disease
management.
Abstract: Hemodialysis patients might suffer from unhealthy
care behaviors or long-term dialysis treatments. Ultimately they need
to be hospitalized. If the hospitalization rate of a hemodialysis center
is high, its quality of service would be low. Therefore, how to decrease
hospitalization rate is a crucial problem for health care. In this study
we combined temporal abstraction with data mining techniques for
analyzing the dialysis patients' biochemical data to develop a decision
support system. The mined temporal patterns are helpful for clinicians
to predict hospitalization of hemodialysis patients and to suggest them
some treatments immediately to avoid hospitalization.
Abstract: This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing on the words and their statistical distributions in documents. The main contributions of the technique are that it integrates XML technology with Information Retrieval scheme (TFIDF) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) and use Data Mining technique for association rules discovery. It consists of three phases: Text Preprocessing phase (transformation, filtration, stemming and indexing of the documents), Association Rule Mining (ARM) phase (applying our designed algorithm for Generating Association Rules based on Weighting scheme GARW) and Visualization phase (visualization of results). Experiments applied on WebPages news documents related to the outbreak of the bird flu disease. The extracted association rules contain important features and describe the informative news included in the documents collection. The performance of the EART system compared with another system that uses the Apriori algorithm throughout the execution time and evaluating extracted association rules.
Abstract: This paper is a description approach to predict
incoming and outgoing data rate in network system by using
association rule discover, which is one of the data mining
techniques. Information of incoming and outgoing data in each
times and network bandwidth are network performance
parameters, which needed to solve in the traffic problem. Since
congestion and data loss are important network problems. The result
of this technique can predicted future network traffic. In addition,
this research is useful for network routing selection and network
performance improvement.
Abstract: MATCH project [1] entitle the development of an
automatic diagnosis system that aims to support treatment of colon
cancer diseases by discovering mutations that occurs to tumour
suppressor genes (TSGs) and contributes to the development of
cancerous tumours. The constitution of the system is based on a)
colon cancer clinical data and b) biological information that will be
derived by data mining techniques from genomic and proteomic
sources The core mining module will consist of the popular, well
tested hybrid feature extraction methods, and new combined
algorithms, designed especially for the project. Elements of rough
sets, evolutionary computing, cluster analysis, self-organization maps
and association rules will be used to discover the annotations
between genes, and their influence on tumours [2]-[11].
The methods used to process the data have to address their high
complexity, potential inconsistency and problems of dealing with the
missing values. They must integrate all the useful information
necessary to solve the expert's question. For this purpose, the system
has to learn from data, or be able to interactively specify by a domain
specialist, the part of the knowledge structure it needs to answer a
given query. The program should also take into account the
importance/rank of the particular parts of data it analyses, and adjusts
the used algorithms accordingly.
Abstract: Estimation time and cost of work completion in a
project and follow up them during execution are contributors to
success or fail of a project, and is very important for project
management team. Delivering on time and within budgeted cost
needs to well managing and controlling the projects. To dealing with
complex task of controlling and modifying the baseline project
schedule during execution, earned value management systems have
been set up and widely used to measure and communicate the real
physical progress of a project. But it often fails to predict the total
duration of the project. In this paper data mining techniques is used
predicting the total project duration in term of Time Estimate At
Completion-EAC (t). For this purpose, we have used a project with
90 activities, it has updated day by day. Then, it is used regular
indexes in literature and applied Earned Duration Method to
calculate time estimate at completion and set these as input data for
prediction and specifying the major parameters among them using
Clem software. By using data mining, the effective parameters on
EAC and the relationship between them could be extracted and it is
very useful to manage a project with minimum delay risks. As we
state, this could be a simple, safe and applicable method in prediction
the completion time of a project during execution.
Abstract: In the era of great competition, understanding and satisfying
customers- requirements are the critical tasks for a company
to make a profits. Customer relationship management (CRM) thus
becomes an important business issue at present. With the help of
the data mining techniques, the manager can explore and analyze
from a large quantity of data to discover meaningful patterns and
rules. Among all methods, well-known association rule is most
commonly seen. This paper is based on Apriori algorithm and uses
genetic algorithms combining a data mining method to discover fuzzy
classification rules. The mined results can be applied in CRM to
help decision marker make correct business decisions for marketing
strategies.
Abstract: In this paper we used data mining techniques to
identify outlier patients who are using large amount of drugs over a
long period of time. Any healthcare or health insurance system
should deal with the quantities of drugs utilized by chronic diseases
patients. In Kingdom of Bahrain, about 20% of health budget is spent
on medications. For the managers of healthcare systems, there is no
enough information about the ways of drug utilization by chronic
diseases patients, is there any misuse or is there outliers patients. In
this work, which has been done in cooperation with information
department in the Bahrain Defence Force hospital; we select the data
for Cardiac patients in the period starting from 1/1/2008 to
December 31/12/2008 to be the data for the model in this paper. We
used three techniques for finding the drug utilization for cardiac
patients. First we applied a clustering technique, followed by
measuring of clustering validity, and finally we applied a decision
tree as classification algorithm. The clustering results is divided into
three clusters according to the drug utilization, for 1603 patients, who
received 15,806 prescriptions during this period can be partitioned
into three groups, where 23 patients (2.59%) who received 1316
prescriptions (8.32%) are classified to be outliers. The classification
algorithm shows that the use of average drug utilization and the age,
and the gender of the patient can be considered to be the main
predictive factors in the induced model.
Abstract: Currently, web usage make a huge data from a lot of
user attention. In general, proxy server is a system to support web
usage from user and can manage system by using hit rates. This
research tries to improve hit rates in proxy system by applying data
mining technique. The data set are collected from proxy servers in the
university and are investigated relationship based on several features.
The model is used to predict the future access websites. Association
rule technique is applied to get the relation among Date, Time, Main
Group web, Sub Group web, and Domain name for created model.
The results showed that this technique can predict web content for the
next day, moreover the future accesses of websites increased from
38.15% to 85.57 %.
This model can predict web page access which tends to increase
the efficient of proxy servers as a result. In additional, the
performance of internet access will be improved and help to reduce
traffic in networks.