Abstract: Existing methods of data mining cannot be applied on
spatial data because they require spatial specificity consideration, as
spatial relationships.
This paper focuses on the classification with decision trees, which
are one of the data mining techniques. We propose an extension of
the C4.5 algorithm for spatial data, based on two different approaches
Join materialization and Querying on the fly the different tables.
Similar works have been done on these two main approaches, the
first - Join materialization - favors the processing time in spite of
memory space, whereas the second - Querying on the fly different
tables- promotes memory space despite of the processing time.
The modified C4.5 algorithm requires three entries tables: a target
table, a neighbor table, and a spatial index join that contains the
possible spatial relationship among the objects in the target table and
those in the neighbor table. Thus, the proposed algorithms are applied
to a spatial data pattern in the accidentology domain.
A comparative study of our approach with other works of
classification by spatial decision trees will be detailed.
Abstract: Textual data plays an important role in the modern
world. The possibilities of applying data mining techniques to
uncover hidden information present in large volumes of text
collections is immense. The Growing Self Organizing Map (GSOM)
is a highly successful member of the Self Organising Map family
and has been used as a clustering and visualisation tool across wide
range of disciplines to discover hidden patterns present in the data.
A comprehensive analysis of the GSOM’s capabilities as a text
clustering and visualisation tool has so far not been published. These
functionalities, namely map visualisation capabilities, automatic
cluster identification and hierarchical clustering capabilities are
presented in this paper and are further demonstrated with experiments
on a benchmark text corpus.
Abstract: This paper aims to create the model for student in choosing an emphasized track of student majoring in computer science at Suan Sunandha Rajabhat University. The objective of this research is to develop the suggested system using data mining technique to analyze knowledge and conduct decision rules. Such relationships can be used to demonstrate the reasonableness of student choosing a track as well as to support his/her decision and the system is verified by experts in the field. The sampling is from student of computer science based on the system and the questionnaire to see the satisfaction. The system result is found to be satisfactory by both experts and student as well.
Abstract: This research aims to create a model for analysis of student behavior using Library resources based on data mining technique in case of Suan Sunandha Rajabhat University. The model was created under association rules, Apriori algorithm. The results were found 14 rules and the rules were tested with testing data set and it showed that the ability of classify data was 79.24percent and the MSE was 22.91. The results showed that the user’s behavior model by using association rule technique can use to manage the library resources.
Abstract: The exponential increase in the volume of medical image database has imposed new challenges to clinical routine in maintaining patient history, diagnosis, treatment and monitoring. With the advent of data mining and machine learning techniques it is possible to automate and/or assist physicians in clinical diagnosis. In this research a medical image classification framework using data mining techniques is proposed. It involves feature extraction, feature selection, feature discretization and classification. In the classification phase, the performance of the traditional kNN k nearest neighbor classifier is improved using a feature weighting scheme and a distance weighted voting instead of simple majority voting. Feature weights are calculated using the interestingness measures used in association rule mining. Experiments on the retinal fundus images show that the proposed framework improves the classification accuracy of traditional kNN from 78.57 % to 92.85 %.
Abstract: The most important subtype of non-Hodgkin-s
lymphoma is the Diffuse Large B-Cell Lymphoma. Approximately
40% of the patients suffering from it respond well to therapy,
whereas the remainder needs a more aggressive treatment, in order to
better their chances of survival. Data Mining techniques have helped
to identify the class of the lymphoma in an efficient manner. Despite
that, thousands of genes should be processed to obtain the results.
This paper presents a comparison of the use of various attribute
selection methods aiming to reduce the number of genes to be
searched, looking for a more effective procedure as a whole.
Abstract: Estimates of temperature values at a specific time of day, from daytime and daily profiles, are needed for a number of environmental, ecological, agricultural and technical applications, ranging from natural hazards assessments, crop growth forecasting to design of solar energy systems. The scope of this research is to investigate the efficiency of data mining techniques in estimating minimum, maximum and mean temperature values. For this reason, a number of experiments have been conducted with well-known regression algorithms using temperature data from the city of Patras in Greece. The performance of these algorithms has been evaluated using standard statistical indicators, such as Correlation Coefficient, Root Mean Squared Error, etc.
Abstract: Music Information Retrieval (MIR) and modern data mining techniques are applied to identify style markers in midi music for stylometric analysis and author attribution. Over 100 attributes are extracted from a library of 2830 songs then mined using supervised learning data mining techniques. Two attributes are identified that provide high informational gain. These attributes are then used as style markers to predict authorship. Using these style markers the authors are able to correctly distinguish songs written by the Beatles from those that were not with a precision and accuracy of over 98 per cent. The identification of these style markers as well as the architecture for this research provides a foundation for future research in musical stylometry.
Abstract: For the past one decade, biclustering has become popular data mining technique not only in the field of biological data analysis but also in other applications like text mining, market data analysis with high-dimensional two-way datasets. Biclustering clusters both rows and columns of a dataset simultaneously, as opposed to traditional clustering which clusters either rows or columns of a dataset. It retrieves subgroups of objects that are similar in one subgroup of variables and different in the remaining variables. Firefly Algorithm (FA) is a recently-proposed metaheuristic inspired by the collective behavior of fireflies. This paper provides a preliminary assessment of discrete version of FA (DFA) while coping with the task of mining coherent and large volume bicluster from web usage dataset. The experiments were conducted on two web usage datasets from public dataset repository whereby the performance of FA was compared with that exhibited by other population-based metaheuristic called binary Particle Swarm Optimization (PSO). The results achieved demonstrate the usefulness of DFA while tackling the biclustering problem.
Abstract: Biclustering is a very useful data mining technique for
identifying patterns where different genes are co-related based on a
subset of conditions in gene expression analysis. Association rules
mining is an efficient approach to achieve biclustering as in
BIMODULE algorithm but it is sensitive to the value given to its
input parameters and the discretization procedure used in the
preprocessing step, also when noise is present, classical association
rules miners discover multiple small fragments of the true bicluster,
but miss the true bicluster itself. This paper formally presents a
generalized noise tolerant bicluster model, termed as μBicluster. An
iterative algorithm termed as BIDENS based on the proposed model
is introduced that can discover a set of k possibly overlapping
biclusters simultaneously. Our model uses a more flexible method to
partition the dimensions to preserve meaningful and significant
biclusters. The proposed algorithm allows discovering biclusters that
hard to be discovered by BIMODULE. Experimental study on yeast,
human gene expression data and several artificial datasets shows that
our algorithm offers substantial improvements over several
previously proposed biclustering algorithms.
Abstract: There are several approaches in trying to solve the
Quantitative 1Structure-Activity Relationship (QSAR) problem.
These approaches are based either on statistical methods or on
predictive data mining. Among the statistical methods, one should
consider regression analysis, pattern recognition (such as cluster
analysis, factor analysis and principal components analysis) or partial
least squares. Predictive data mining techniques use either neural
networks, or genetic programming, or neuro-fuzzy knowledge. These
approaches have a low explanatory capability or non at all. This
paper attempts to establish a new approach in solving QSAR
problems using descriptive data mining. This way, the relationship
between the chemical properties and the activity of a substance
would be comprehensibly modeled.
Abstract: The purpose of this research aims to discover the
knowledge for analysis student motivation behavior on e-Learning
based on Data Mining Techniques, in case of the Information
Technology for Communication and Learning Course at Suan
Sunandha Rajabhat University. The data mining techniques was
applied in this research including association rules, classification
techniques. The results showed that using data mining technique can
indicate the important variables that influence the student motivation
behavior on e-Learning.
Abstract: Evidence-based medicine is a new direction in modern healthcare. Its task is to prevent, diagnose and medicate diseases using medical evidence. Medical data about a large patient population is analyzed to perform healthcare management and medical research. In order to obtain the best evidence for a given disease, external clinical expertise as well as internal clinical experience must be available to the healthcare practitioners at right time and in the right manner. External evidence-based knowledge can not be applied directly to the patient without adjusting it to the patient-s health condition. We propose a data warehouse based approach as a suitable solution for the integration of external evidence-based data sources into the existing clinical information system and data mining techniques for finding appropriate therapy for a given patient and a given disease. Through integration of data warehousing, OLAP and data mining techniques in the healthcare area, an easy to use decision support platform, which supports decision making process of care givers and clinical managers, is built. We present three case studies, which show, that a clinical data warehouse that facilitates evidence-based medicine is a reliable, powerful and user-friendly platform for strategic decision making, which has a great relevance for the practice and acceptance of evidence-based medicine.
Abstract: Data mining techniques have been used in medical
research for many years and have been known to be effective. In order
to solve such problems as long-waiting time, congestion, and delayed
patient care, faced by emergency departments, this study concentrates
on building a hybrid methodology, combining data mining techniques
such as association rules and classification trees. The methodology is
applied to real-world emergency data collected from a hospital and is
evaluated by comparing with other techniques. The methodology is
expected to help physicians to make a faster and more accurate
classification of chest pain diseases.
Abstract: Application of Information Technology (IT) has
revolutionized the functioning of business all over the world. Its
impact has been felt mostly among the information of dependent
industries. Tourism is one of such industry. The conceptual
framework in this study represents an innovation of travel
information searching system on mobile devices which is used as
tools to deliver travel information (such as hotels, restaurants, tourist
attractions and souvenir shops) for each user by travelers
segmentation based on data mining technique to segment the tourists-
behavior patterns then match them with tourism products and
services. This system innovation is designed to be a knowledge
incremental learning. It is a marketing strategy to support business to
respond traveler-s demand effectively.
Abstract: This paper proposes an auto-classification algorithm
of Web pages using Data mining techniques. We consider the
problem of discovering association rules between terms in a set of
Web pages belonging to a category in a search engine database, and
present an auto-classification algorithm for solving this problem that
are fundamentally based on Apriori algorithm. The proposed
technique has two phases. The first phase is a training phase where
human experts determines the categories of different Web pages, and
the supervised Data mining algorithm will combine these categories
with appropriate weighted index terms according to the highest
supported rules among the most frequent words. The second phase is
the categorization phase where a web crawler will crawl through the
World Wide Web to build a database categorized according to the
result of the data mining approach. This database contains URLs and
their categories.
Abstract: this article proposed a methodology for computer
numerical control (CNC) machine scoring. The case study company
is a manufacturer of hard disk drive parts in Thailand. In this
company, sample of parts manufactured from CNC machine are
usually taken randomly for quality inspection. These inspection data
were used to make a decision to shut down the machine if it has
tendency to produce parts that are out of specification. Large amount
of data are produced in this process and data mining could be very
useful technique in analyzing them. In this research, data mining
techniques were used to construct a machine scoring model called
'machine priority assessment model (MPAM)'. This model helps to
ensure that the machine with higher risk of producing defective parts
be inspected before those with lower risk. If the defective prone
machine is identified sooner, defective part and rework could be
reduced hence improving the overall productivity. The results
showed that the proposed method can be successfully implemented
and approximately 351,000 baht of opportunity cost could have
saved in the case study company.
Abstract: This study proposes a novel recommender system to
provide the advertisements of context-aware services. Our proposed
model is designed to apply a modified collaborative filtering (CF)
algorithm with regard to the several dimensions for the personalization
of mobile devices – location, time and the user-s needs type. In
particular, we employ a classification rule to understand user-s needs
type using a decision tree algorithm. In addition, we collect primary
data from the mobile phone users and apply them to the proposed
model to validate its effectiveness. Experimental results show that the
proposed system makes more accurate and satisfactory advertisements
than comparative systems.
Abstract: The paper gives the pilot results of the project that is
oriented on the use of data mining techniques and knowledge
discoveries from production systems through them. They have been
used in the management of these systems. The simulation models of
manufacturing systems have been developed to obtain the necessary
data about production. The authors have developed the way of
storing data obtained from the simulation models in the data
warehouse. Data mining model has been created by using specific
methods and selected techniques for defined problems of production
system management. The new knowledge has been applied to
production management system. Gained knowledge has been tested
on simulation models of the production system. An important benefit
of the project has been proposal of the new methodology. This
methodology is focused on data mining from the databases that store
operational data about the production process.
Abstract: The healthcare environment is generally perceived as
being information rich yet knowledge poor. However, there is a lack
of effective analysis tools to discover hidden relationships and trends
in data. In fact, valuable knowledge can be discovered from
application of data mining techniques in healthcare system. In this
study, a proficient methodology for the extraction of significant
patterns from the Coronary Heart Disease warehouses for heart
attack prediction, which unfortunately continues to be a leading cause
of mortality in the whole world, has been presented. For this purpose,
we propose to enumerate dynamically the optimal subsets of the
reduced features of high interest by using rough sets technique
associated to dynamic programming. Therefore, we propose to
validate the classification using Random Forest (RF) decision tree to
identify the risky heart disease cases. This work is based on a large
amount of data collected from several clinical institutions based on
the medical profile of patient. Moreover, the experts- knowledge in
this field has been taken into consideration in order to define the
disease, its risk factors, and to establish significant knowledge
relationships among the medical factors. A computer-aided system is
developed for this purpose based on a population of 525 adults. The
performance of the proposed model is analyzed and evaluated based
on set of benchmark techniques applied in this classification problem.