Abstract: Interest in (STEM) Science Technology Engineering
Mathematics education especially Computer Science education has
seen a drastic increase across the country. This fuels effort towards
recruiting and admitting a diverse population of students. Thus the
changing conditions in terms of the student population, diversity
and the expected teaching and learning outcomes give the platform
for use of Innovative Teaching models and technologies. It is
necessary that these methods adapted should also concentrate on
raising quality of such innovations and have positive impact on
student learning. Light-Weight Team is an Active Learning Pedagogy,
which is considered to be low-stake activity and has very little or
no direct impact on student grades. Emotion plays a major role in
student’s motivation to learning. In this work we use the student
feedback data with emotion classification using surveys at a public
research institution in the United States. We use Actionable Pattern
Discovery method for this purpose. Actionable patterns are patterns
that provide suggestions in the form of rules to help the user achieve
better outcomes. The proposed method provides meaningful insight
in terms of changes that can be incorporated in the Light-Weight team
activities, resources utilized in the course. The results suggest how
to enhance student emotions to a more positive state, in particular
focuses on the emotions ‘Trust’ and ‘Joy’.
Abstract: This paper presents a classifier ensemble approach for
predicting the survivability of the breast cancer patients using the
latest database version of the Surveillance, Epidemiology, and End
Results (SEER) Program of the National Cancer Institute. The system
consists of two main components; features selection and classifier
ensemble components. The features selection component divides the
features in SEER database into four groups. After that it tries to find
the most important features among the four groups that maximizes the
weighted average F-score of a certain classification algorithm. The
ensemble component uses three different classifiers, each of which
models different set of features from SEER through the features
selection module. On top of them, another classifier is used to give
the final decision based on the output decisions and confidence
scores from each of the underlying classifiers. Different classification
algorithms have been examined; the best setup found is by using the
decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the
underlying classifiers and Na¨ıve Bayes for the classifier ensemble
step. The system outperforms all published systems to date when
evaluated against the exact same data of SEER (period of 1973-2002).
It gives 87.39% weighted average F-score compared to 85.82% and
81.34% of the other published systems. By increasing the data size to
cover the whole database (period of 1973-2014), the overall weighted
average F-score jumps to 92.4% on the held out unseen test set.
Abstract: People, throughout the history, have made estimates
and inferences about the future by using their past experiences.
Developing information technologies and the improvements in the
database management systems make it possible to extract useful
information from knowledge in hand for the strategic decisions.
Therefore, different methods have been developed. Data mining by
association rules learning is one of such methods. Apriori algorithm,
one of the well-known association rules learning algorithms, is not
commonly used in spatio-temporal data sets. However, it is possible
to embed time and space features into the data sets and make Apriori
algorithm a suitable data mining technique for learning spatiotemporal
association rules. Lake Van, the largest lake of Turkey, is a
closed basin. This feature causes the volume of the lake to increase or
decrease as a result of change in water amount it holds. In this study,
evaporation, humidity, lake altitude, amount of rainfall and
temperature parameters recorded in Lake Van region throughout the
years are used by the Apriori algorithm and a spatio-temporal data
mining application is developed to identify overflows and newlyformed
soil regions (underflows) occurring in the coastal parts of
Lake Van. Identifying possible reasons of overflows and underflows
may be used to alert the experts to take precautions and make the
necessary investments.
Abstract:
ankings for output of Chinese main agricultural commodity in the world for 1978, 1980, 1990, 2000, 2006, 2007 and 2008 have been released in United Nations FAO Database. Unfortunately, where the ranking of output of Chinese cotton lint in the world for 2008 was missed. This paper uses sequential data mining methods with decision rules filling this gap. This new data mining method will be help to give a further improvement for United Nations FAO Database.
Abstract: Spatial outliers in remotely sensed imageries represent
observed quantities showing unusual values compared to their
neighbor pixel values. There have been various methods to detect the
spatial outliers based on spatial autocorrelations in statistics and data
mining. These methods may be applied in detecting forest fire pixels
in the MODIS imageries from NASA-s AQUA satellite. This is
because the forest fire detection can be referred to as finding spatial
outliers using spatial variation of brightness temperature. This point is
what distinguishes our approach from the traditional fire detection
methods. In this paper, we propose a graph-based forest fire detection
algorithm which is based on spatial outlier detection methods, and test
the proposed algorithm to evaluate its applicability. For this the
ordinary scatter plot and Moran-s scatter plot were used. In order to
evaluate the proposed algorithm, the results were compared with the
MODIS fire product provided by the NASA MODIS Science Team,
which showed the possibility of the proposed algorithm in detecting
the fire pixels.
Abstract: Network security attacks are the violation of
information security policy that received much attention to the
computational intelligence society in the last decades. Data mining
has become a very useful technique for detecting network intrusions
by extracting useful knowledge from large number of network data
or logs. Naïve Bayesian classifier is one of the most popular data
mining algorithm for classification, which provides an optimal way
to predict the class of an unknown example. It has been tested that
one set of probability derived from data is not good enough to have
good classification rate. In this paper, we proposed a new learning
algorithm for mining network logs to detect network intrusions
through naïve Bayesian classifier, which first clusters the network
logs into several groups based on similarity of logs, and then
calculates the prior and conditional probabilities for each group of
logs. For classifying a new log, the algorithm checks in which cluster
the log belongs and then use that cluster-s probability set to classify
the new log. We tested the performance of our proposed algorithm by
employing KDD99 benchmark network intrusion detection dataset,
and the experimental results proved that it improves detection rates
as well as reduces false positives for different types of network
intrusions.
Abstract: Methods of clustering which were developed in the
data mining theory can be successfully applied to the investigation of
different kinds of dependencies between the conditions of
environment and human activities. It is known, that environmental
parameters such as temperature, relative humidity, atmospheric
pressure and illumination have significant effects on the human
mental performance. To investigate these parameters effect, data
mining technique of clustering using entropy and Information Gain
Ratio (IGR) K(Y/X) = (H(X)–H(Y/X))/H(Y) is used, where
H(Y)=-ΣPi ln(Pi). This technique allows adjusting the boundaries of
clusters. It is shown that the information gain ratio (IGR) grows
monotonically and simultaneously with degree of connectivity
between two variables. This approach has some preferences if
compared, for example, with correlation analysis due to relatively
smaller sensitivity to shape of functional dependencies. Variant of an
algorithm to implement the proposed method with some analysis of
above problem of environmental effects is also presented. It was
shown that proposed method converges with finite number of steps.
Abstract: Typical Intelligent Decision Support System is 4-based, its design composes of Data Warehouse, Online Analytical Processing, Data Mining and Decision Supporting based on models, which is called Decision Support System Based on Data Warehouse (DSSBDW). This way takes ETL,OLAP and DM as its implementing means, and integrates traditional model-driving DSS and data-driving DSS into a whole. For this kind of problem, this paper analyzes the DSSBDW architecture and DW model, and discusses the following key issues: ETL designing and Realization; metadata managing technology using XML; SQL implementing, optimizing performance, data mapping in OLAP; lastly, it illustrates the designing principle and method of DW in DSSBDW.
Abstract: Web usage mining is an interesting application of data
mining which provides insight into customer behaviour on the Internet. An important technique to discover user access and navigation trails is based on sequential patterns mining. One of the
key challenges for web access patterns mining is tackling the problem
of mining richly structured patterns. This paper proposes a novel
model called Web Access Patterns Graph (WAP-Graph) to represent all of the access patterns from web mining graphically. WAP-Graph
also motivates the search for new structural relation patterns, i.e. Concurrent Access Patterns (CAP), to identify and predict more
complex web page requests. Corresponding CAP mining and modelling methods are proposed and shown to be effective in the
search for and representation of concurrency between access patterns
on the web. From experiments conducted on large-scale synthetic
sequence data as well as real web access data, it is demonstrated that
CAP mining provides a powerful method for structural knowledge discovery, which can be visualised through the CAP-Graph model.
Abstract: Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules lead to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach that uses objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules. We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are quite promising.
Abstract: Recently, information security has become a key issue
in information technology as the number of computer security
breaches are exposed to an increasing number of security threats. A
variety of intrusion detection systems (IDS) have been employed for
protecting computers and networks from malicious network-based or
host-based attacks by using traditional statistical methods to new data
mining approaches in last decades. However, today's commercially
available intrusion detection systems are signature-based that are not
capable of detecting unknown attacks. In this paper, we present a
new learning algorithm for anomaly based network intrusion
detection system using decision tree algorithm that distinguishes
attacks from normal behaviors and identifies different types of
intrusions. Experimental results on the KDD99 benchmark network
intrusion detection dataset demonstrate that the proposed learning
algorithm achieved 98% detection rate (DR) in comparison with
other existing methods.
Abstract: This paper presents a system for discovering
association rules from collections of unstructured documents called
EART (Extract Association Rules from Text). The EART system
treats texts only not images or figures. EART discovers association
rules amongst keywords labeling the collection of textual documents.
The main characteristic of EART is that the system integrates XML
technology (to transform unstructured documents into structured
documents) with Information Retrieval scheme (TF-IDF) and Data
Mining technique for association rules extraction. EART depends on
word feature to extract association rules. It consists of four phases:
structure phase, index phase, text mining phase and visualization
phase. Our work depends on the analysis of the keywords in the
extracted association rules through the co-occurrence of the keywords
in one sentence in the original text and the existing of the keywords
in one sentence without co-occurrence. Experiments applied on a
collection of scientific documents selected from MEDLINE that are
related to the outbreak of H5N1 avian influenza virus.
Abstract: The vast amount of information hidden in huge
databases has created tremendous interests in the field of data
mining. This paper examines the possibility of using data clustering
techniques in oral medicine to identify functional relationships
between different attributes and classification of similar patient
examinations. Commonly used data clustering algorithms have been
reviewed and as a result several interesting results have been
gathered.
Abstract: Needs of an efficient information retrieval in recent
years in increased more then ever because of the frequent use of
digital information in our life. We see a lot of work in the area of
textual information but in multimedia information, we cannot find
much progress. In text based information, new technology of data
mining and data marts are now in working that were started from the
basic concept of database some where in 1960.
In image search and especially in image identification,
computerized system at very initial stages. Even in the area of image
search we cannot see much progress as in the case of text based
search techniques. One main reason for this is the wide spread roots
of image search where many area like artificial intelligence,
statistics, image processing, pattern recognition play their role. Even
human psychology and perception and cultural diversity also have
their share for the design of a good and efficient image recognition
and retrieval system.
A new object based search technique is presented in this paper
where object in the image are identified on the basis of their
geometrical shapes and other features like color and texture where
object-co-relation augments this search process.
To be more focused on objects identification, simple images are
selected for the work to reduce the role of segmentation in overall
process however same technique can also be applied for other
images.
Abstract: Chess is one of the indoor games, which improves the
level of human confidence, concentration, planning skills and
knowledge. The main objective of this paper is to help the chess
players to improve their chess openings using data mining
techniques. Budding Chess Players usually do practices by analyzing
various existing openings. When they analyze and correlate
thousands of openings it becomes tedious and complex for them. The
work done in this paper is to analyze the best lines of Blackmar-
Diemer Gambit(BDG) which opens with White D4... using data
mining analysis. It is carried out on the collection of winning games
by applying association rules. The first step of this analysis is
assigning variables to each different sequence moves. In the second
step, the sequence association rules were generated to calculate
support and confidence factor which help us to find the best
subsequence chess moves that may lead to winning position.
Abstract: Maintenance is one of the most important activities in
the shipyard industry. However, sometimes it is not supported by
adequate services from the shipyard, where inaccuracy in estimating
the duration of the ship maintenance is still common. This makes
estimation of ship maintenance duration is crucial. This study uses
Data Mining approach, i.e., CART (Classification and Regression
Tree) to estimate the duration of ship maintenance that is limited to
dock works or which is known as dry docking. By using the volume
of dock works as an input to estimate the maintenance duration, 4
classes of dry docking duration were obtained with different linear
model and job criteria for each class. These linear models can then be
used to estimate the duration of dry docking based on job criteria.