Abstract: Server provisioning is one of the most attractive topics in virtualization systems. Virtualization is a method of running multiple independent virtual operating systems on a single physical computer. It is a way of maximizing physical resources to maximize the investment in hardware. Additionally, it can help to consolidate servers, improve hardware utilization and reduce the consumption of power and physical space in the data center. However, management of heterogeneous workloads, especially for resource utilization of the server, or so called provisioning becomes a challenge. In this paper, a new concept for managing workloads based on user behavior is presented. The experimental results show that user behaviors are different in each type of service workload and time. Understanding user behaviors may improve the efficiency of management in provisioning concept. This preliminary study may be an approach to improve management of data centers running heterogeneous workloads for provisioning in virtualization system.
Abstract: Association rules are an important problem in data
mining. Massively increasing volume of data in real life databases
has motivated researchers to design novel and incremental algorithms
for association rules mining. In this paper, we propose an incremental
association rules mining algorithm that integrates shocking
interestingness criterion during the process of building the model. A
new interesting measure called shocking measure is introduced. One
of the main features of the proposed approach is to capture the user
background knowledge, which is monotonically augmented. The
incremental model that reflects the changing data and the user beliefs
is attractive in order to make the over all KDD process more
effective and efficient. We implemented the proposed approach and
experiment it with some public datasets and found the results quite
promising.
Abstract: This paper focuses on the data-driven generation
of fuzzy IF...THEN rules. The resulted fuzzy rule base can be
applied to build a classifier, a model used for prediction, or
it can be applied to form a decision support system. Among
the wide range of possible approaches, the decision tree and
the association rule based algorithms are overviewed, and two
new approaches are presented based on the a priori fuzzy
clustering based partitioning of the continuous input variables.
An application study is also presented, where the developed
methods are tested on the well known Wisconsin Breast Cancer
classification problem.
Abstract: The Kansei engineering is a technology which
converts human feelings into quantitative terms and helps designers
develop new products that meet customers- expectation. Standard
Kansei engineering procedure involves finding relationships between
human feelings and design elements of which many researchers have
found forward and backward relationship through various soft
computing techniques. In this paper, we proposed the framework of
Kansei engineering linking relationship not only between human
feelings and design elements, but also the whole part of product, by
constructing association rules. In this experiment, we obtain input
from emotion score that subjects rate when they see the whole part of
the product by applying semantic differentials. Then, association
rules are constructed to discover the combination of design element
which affects the human feeling. The results of our experiment
suggest the pattern of relationship of design elements according to
human feelings which can be derived from the whole part of product.
Abstract: Biclustering is a very useful data mining technique for
identifying patterns where different genes are co-related based on a
subset of conditions in gene expression analysis. Association rules
mining is an efficient approach to achieve biclustering as in
BIMODULE algorithm but it is sensitive to the value given to its
input parameters and the discretization procedure used in the
preprocessing step, also when noise is present, classical association
rules miners discover multiple small fragments of the true bicluster,
but miss the true bicluster itself. This paper formally presents a
generalized noise tolerant bicluster model, termed as μBicluster. An
iterative algorithm termed as BIDENS based on the proposed model
is introduced that can discover a set of k possibly overlapping
biclusters simultaneously. Our model uses a more flexible method to
partition the dimensions to preserve meaningful and significant
biclusters. The proposed algorithm allows discovering biclusters that
hard to be discovered by BIMODULE. Experimental study on yeast,
human gene expression data and several artificial datasets shows that
our algorithm offers substantial improvements over several
previously proposed biclustering algorithms.
Abstract: In an era of knowledge explosion, the growth of data
increases rapidly day by day. Since data storage is a limited resource,
how to reduce the data space in the process becomes a challenge issue.
Data compression provides a good solution which can lower the
required space. Data mining has many useful applications in recent
years because it can help users discover interesting knowledge in large
databases. However, existing compression algorithms are not
appropriate for data mining. In [1, 2], two different approaches were
proposed to compress databases and then perform the data mining
process. However, they all lack the ability to decompress the data to
their original state and improve the data mining performance. In this
research a new approach called Mining Merged Transactions with the
Quantification Table (M2TQT) was proposed to solve these problems.
M2TQT uses the relationship of transactions to merge related
transactions and builds a quantification table to prune the candidate
itemsets which are impossible to become frequent in order to improve
the performance of mining association rules. The experiments show
that M2TQT performs better than existing approaches.
Abstract: There are several approaches in trying to solve the
Quantitative 1Structure-Activity Relationship (QSAR) problem.
These approaches are based either on statistical methods or on
predictive data mining. Among the statistical methods, one should
consider regression analysis, pattern recognition (such as cluster
analysis, factor analysis and principal components analysis) or partial
least squares. Predictive data mining techniques use either neural
networks, or genetic programming, or neuro-fuzzy knowledge. These
approaches have a low explanatory capability or non at all. This
paper attempts to establish a new approach in solving QSAR
problems using descriptive data mining. This way, the relationship
between the chemical properties and the activity of a substance
would be comprehensibly modeled.
Abstract: The purpose of this research aims to discover the
knowledge for analysis student motivation behavior on e-Learning
based on Data Mining Techniques, in case of the Information
Technology for Communication and Learning Course at Suan
Sunandha Rajabhat University. The data mining techniques was
applied in this research including association rules, classification
techniques. The results showed that using data mining technique can
indicate the important variables that influence the student motivation
behavior on e-Learning.
Abstract: The problem of frequent itemset mining is considered in this paper. One new technique proposed to generate frequent patterns in large databases without time-consuming candidate generation. This technique is based on focusing on transaction instead of concentrating on itemset. This algorithm based on take intersection between one transaction and others transaction and the maximum shared items between transactions computed instead of creating itemset and computing their frequency. With applying real life transactions and some consumption is taken from real life data, the significant efficiency acquire from databases in generation association rules mining.
Abstract: In data mining, the association rules are used to find
for the associations between the different items of the transactions
database. As the data collected and stored, rules of value can be found
through association rules, which can be applied to help managers
execute marketing strategies and establish sound market frameworks.
This paper aims to use Fuzzy Frequent Pattern growth (FFP-growth)
to derive from fuzzy association rules. At first, we apply fuzzy
partition methods and decide a membership function of quantitative
value for each transaction item. Next, we implement FFP-growth
to deal with the process of data mining. In addition, in order to
understand the impact of Apriori algorithm and FFP-growth algorithm
on the execution time and the number of generated association
rules, the experiment will be performed by using different sizes of
databases and thresholds. Lastly, the experiment results show FFPgrowth
algorithm is more efficient than other existing methods.
Abstract: In Virtual organization, Knowledge Discovery (KD)
service contains distributed data resources and computing grid nodes.
Computational grid is integrated with data grid to form Knowledge
Grid, which implements Apriori algorithm for mining association
rule on grid network. This paper describes development of parallel
and distributed version of Apriori algorithm on Globus Toolkit using
Message Passing Interface extended with Grid Services (MPICHG2).
The creation of Knowledge Grid on top of data and
computational grid is to support decision making in real time
applications. In this paper, the case study describes design and
implementation of local and global mining of frequent item sets. The
experiments were conducted on different configurations of grid
network and computation time was recorded for each operation. We
analyzed our result with various grid configurations and it shows
speedup of computation time is almost superlinear.
Abstract: Data mining techniques have been used in medical
research for many years and have been known to be effective. In order
to solve such problems as long-waiting time, congestion, and delayed
patient care, faced by emergency departments, this study concentrates
on building a hybrid methodology, combining data mining techniques
such as association rules and classification trees. The methodology is
applied to real-world emergency data collected from a hospital and is
evaluated by comparing with other techniques. The methodology is
expected to help physicians to make a faster and more accurate
classification of chest pain diseases.
Abstract: A generic and extendible Multi-Agent Data Mining
(MADM) framework, MADMF (the Multi-Agent Data Mining
Framework) is described. The central feature of the framework is that
it avoids the use of agreed meta-language formats by supporting a
framework of wrappers.
The advantage offered is that the framework is easily extendible,
so that further data agents and mining agents can simply be added to
the framework. A demonstration MADMF framework is currently
available. The paper includes details of the MADMF architecture and
the wrapper principle incorporated into it. A full description and
evaluation of the framework-s operation is provided by considering
two MADM scenarios.
Abstract: A big organization may have multiple branches spread across different locations. Processing of data from these branches becomes a huge task when innumerable transactions take place. Also, branches may be reluctant to forward their data for centralized processing but are ready to pass their association rules. Local mining may also generate a large amount of rules. Further, it is not practically possible for all local data sources to be of the same size. A model is proposed for discovering valid rules from different sized data sources where the valid rules are high weighted rules. These rules can be obtained from the high frequency rules generated from each of the data sources. A data source selection procedure is considered in order to efficiently synthesize rules. Support Equalization is another method proposed which focuses on eliminating low frequency rules at the local sites itself thus reducing the rules by a significant amount.
Abstract: The inherent flexibilities of XML in both structure
and semantics makes mining from XML data a complex task with
more challenges compared to traditional association rule mining in
relational databases. In this paper, we propose a new model for the
effective extraction of generalized association rules form a XML
document collection. We directly use frequent subtree mining
techniques in the discovery process and do not ignore the tree
structure of data in the final rules. The frequent subtrees based on the
user provided support are split to complement subtrees to form the
rules. We explain our model within multi-steps from data preparation
to rule generation.
Abstract: This paper proposes an auto-classification algorithm
of Web pages using Data mining techniques. We consider the
problem of discovering association rules between terms in a set of
Web pages belonging to a category in a search engine database, and
present an auto-classification algorithm for solving this problem that
are fundamentally based on Apriori algorithm. The proposed
technique has two phases. The first phase is a training phase where
human experts determines the categories of different Web pages, and
the supervised Data mining algorithm will combine these categories
with appropriate weighted index terms according to the highest
supported rules among the most frequent words. The second phase is
the categorization phase where a web crawler will crawl through the
World Wide Web to build a database categorized according to the
result of the data mining approach. This database contains URLs and
their categories.
Abstract: The goal of this paper is to segment the countries
based on the value of export from Iran during 14 years ending at 2005. To measure the dissimilarity among export baskets of different countries, we define Dissimilarity Export Basket (DEB) function and
use this distance function in K-means algorithm. The DEB function
is defined based on the concepts of the association rules and the
value of export group-commodities. In this paper, clustering quality
function and clusters intraclass inertia are defined to, respectively,
calculate the optimum number of clusters and to compare the
functionality of DEB versus Euclidean distance. We have also study
the effects of importance weight in DEB function to improve
clustering quality. Lastly when segmentation is completed, a
designated RFM model is used to analyze the relative profitability of
each cluster.
Abstract: This paper focuses on analyzing medical diagnostic data using classification rules in data mining and context reduction in formal concept analysis. It helps in finding redundancies among the various medical examination tests used in diagnosis of a disease. Classification rules have been derived from positive and negative association rules using the Concept lattice structure of the Formal Concept Analysis. Context reduction technique given in Formal Concept Analysis along with classification rules has been used to find redundancies among the various medical examination tests. Also it finds out whether expensive medical tests can be replaced by some cheaper tests.
Abstract: To overcome the product overload of Internet
shoppers, we introduce a semantic recommendation procedure which
is more efficient when applied to Internet shopping malls. The
suggested procedure recommends the semantic products to the
customers and is originally based on Web usage mining, product
classification, association rule mining, and frequently purchasing.
We applied the procedure to the data set of MovieLens Company for
performance evaluation, and some experimental results are provided.
The experimental results have shown superior performance in
terms of coverage and precision.
Abstract: Numerical analysis naturally finds applications in all
fields of engineering and the physical sciences, but in the
21st century, the life sciences and even the arts have adopted
elements of scientific computations. The numerical data analysis
became key process in research and development of all the fields [6].
In this paper we have made an attempt to analyze the specified
numerical patterns with reference to the association rule mining
techniques with minimum confidence and minimum support mining
criteria. The extracted rules and analyzed results are graphically
demonstrated. Association rules are a simple but very useful form of
data mining that describe the probabilistic co-occurrence of certain
events within a database [7]. They were originally designed to
analyze market-basket data, in which the likelihood of items being
purchased together within the same transactions are analyzed.