Abstract: Big Data represents the recent technology of manipulating voluminous and unstructured data sets over multiple sources. Therefore, NOSQL appears to handle the problem of unstructured data. Association rules mining is one of the popular techniques of data mining to extract hidden relationship from transactional databases. The algorithm for finding association dependencies is well-solved with Map Reduce. The goal of our work is to reduce the time of generating of frequent itemsets by using Map Reduce and NOSQL database oriented document. A comparative study is given to evaluate the performances of our algorithm with the classical algorithm Apriori.
Abstract: Environmental changes and major natural disasters are
most prevalent in the world due to the damage that humanity has
caused to nature and these damages directly affect the lives of
animals. Thus, the study of animal behavior and their interactions
with the environment can provide knowledge that guides researchers
and public agencies in preservation and conservation actions.
Exploratory analysis of animal movement can determine the patterns
of animal behavior and with technological advances the ability of
animals to be tracked and, consequently, behavioral studies have
been expanded. There is a lot of research on animal movement and
behavior, but we note that a proposal that combines resources and
allows for exploratory analysis of animal movement and provide
statistical measures on individual animal behavior and its interaction
with the environment is missing. The contribution of this paper is
to present the framework AniMoveMineR, a unified solution that
aggregates trajectory analysis and data mining techniques to explore
animal movement data and provide a first step in responding questions
about the animal individual behavior and their interactions with other
animals over time and space. We evaluated the framework through the
use of monitored jaguar data in the city of Miranda Pantanal, Brazil,
in order to verify if the use of AniMoveMineR allows to identify the
interaction level between these jaguars. The results were positive and
provided indications about the individual behavior of jaguars and
about which jaguars have the highest or lowest correlation.
Abstract: This study aims to analyze the text terms posted on an online community of people from the same hometown and to understand the topic and trend of nostalgia composed online. For this purpose, this study collected 144 writings which the natives of Yeongjong Island, Incheon, South-Korea have posted on an online community. And it analyzed association relations. As a result, online community texts means that just defining nostalgia as ‘a mind longing for hometown’ is not an enough explanation. Second, texts composed online have abstractness rather than persons’ individual stories. This study figured out the relationship that had the most critical and closest mutual association among the terms that constituted nostalgia through literature research and association rule concerning nostalgia. The result of this study has a characteristic that it summed up the core terms and emotions related to nostalgia.
Abstract: Recommender Systems have been developed to provide contents and services compatible to users based on their behaviors and interests. Due to information overload in online discussion forums and users diverse interests, recommending relative topics and threads is considered to be helpful for improving the ease of forum usage. In order to lead learners to find relevant information in educational forums, recommendations are even more needed. We present a hybrid thread recommender system for MOOC forums by applying social network analysis and association rule mining techniques. Initial results indicate that the proposed recommender system performs comparatively well with regard to limited available data from users' previous posts in the forum.
Abstract: Data mining is the process of extracting useful or hidden information from a large database. Extracted information can be used to discover relationships among features, where data objects are grouped according to logical relationships; or to predict unseen objects to one of the predefined groups. In this paper, we aim to investigate four well-known data mining algorithms in order to predict groundwater areas in Jordan. These algorithms are Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbor (kNN) and Classification Based on Association Rule (CBA). The experimental results indicate that the SVMs algorithm outperformed other algorithms in terms of classification accuracy, precision and F1 evaluation measures using the datasets of groundwater areas that were collected from Jordanian Ministry of Water and Irrigation.
Abstract: Construction defects are major components that result in negative impacts on project performance including schedule delays and cost overruns. Since construction defects generally occur when a few associated causes combine, a thorough understanding of defect causality is required in order to more systematically prevent construction defects. To address this issue, this paper uses association rule mining (ARM) to quantify the causality between defect causes, and social network analysis (SNA) to find indirect causality among them. The suggested approach is validated with 350 defect instances from concrete works in 32 projects in Korea. The results show that the interrelationships revealed by the approach reflect the characteristics of the concrete task and the important causes that should be prevented.
Abstract: People, throughout the history, have made estimates
and inferences about the future by using their past experiences.
Developing information technologies and the improvements in the
database management systems make it possible to extract useful
information from knowledge in hand for the strategic decisions.
Therefore, different methods have been developed. Data mining by
association rules learning is one of such methods. Apriori algorithm,
one of the well-known association rules learning algorithms, is not
commonly used in spatio-temporal data sets. However, it is possible
to embed time and space features into the data sets and make Apriori
algorithm a suitable data mining technique for learning spatiotemporal
association rules. Lake Van, the largest lake of Turkey, is a
closed basin. This feature causes the volume of the lake to increase or
decrease as a result of change in water amount it holds. In this study,
evaporation, humidity, lake altitude, amount of rainfall and
temperature parameters recorded in Lake Van region throughout the
years are used by the Apriori algorithm and a spatio-temporal data
mining application is developed to identify overflows and newlyformed
soil regions (underflows) occurring in the coastal parts of
Lake Van. Identifying possible reasons of overflows and underflows
may be used to alert the experts to take precautions and make the
necessary investments.
Abstract: In order to help the expert to validate association rules
extracted from data, some quality measures are proposed in the
literature. We distinguish two categories: objective and subjective
measures. The first one depends on a fixed threshold and on data
quality from which the rules are extracted. The second one consists
on providing to the expert some tools in the objective to explore and
visualize rules during the evaluation step. However, the number of
extracted rules to validate remains high. Thus, the manually mining
rules task is very hard. To solve this problem, we propose, in this
paper, a semi-automatic method to assist the expert during the
association rule's validation. Our method uses rule-based
classification as follow: (i) We transform association rules into
classification rules (classifiers), (ii) We use the generated classifiers
for data classification. (iii) We visualize association rules with their
quality classification to give an idea to the expert and to assist him
during validation process.
Abstract: Association rule mining is one of the most important fields of data mining and knowledge discovery. In this paper, we propose an efficient multiple support frequent pattern growth algorithm which we called “MSFP-growth” that enhancing the FPgrowth algorithm by making infrequent child node pruning step with multiple minimum support using maximum constrains. The algorithm is implemented, and it is compared with other common algorithms: Apriori-multiple minimum supports using maximum constraints and FP-growth. The experimental results show that the rule mining from the proposed algorithm are interesting and our algorithm achieved better performance than other algorithms without scarifying the accuracy.
Abstract: The use of eXtensible Markup Language (XML) in
web, business and scientific databases lead to the development of
methods, techniques and systems to manage and analyze XML data.
Semi-structured documents suffer due to its heterogeneity and
dimensionality. XML structure and content mining represent
convergence for research in semi-structured data and text mining. As
the information available on the internet grows drastically, extracting
knowledge from XML documents becomes a harder task. Certainly,
documents are often so large that the data set returned as answer to a
query may also be very big to convey the required information. To
improve the query answering, a Semantic Tree Based Association
Rule (STAR) mining method is proposed. This method provides
intentional information by considering the structure, content and the
semantics of the content. The method is applied on Reuter’s dataset
and the results show that the proposed method outperforms well.
Abstract: Recommendation systems are widely used in
e-commerce applications. The engine of a current recommendation
system recommends items to a particular user based on user
preferences and previous high ratings. Various recommendation
schemes such as collaborative filtering and content-based approaches
are used to build a recommendation system. Most of current
recommendation systems were developed to fit a certain domain such
as books, articles, and movies. We propose1 a hybrid framework
recommendation system to be applied on two dimensional spaces
(User × Item) with a large number of Users and a small number
of Items. Moreover, our proposed framework makes use of both
favorite and non-favorite items of a particular user. The proposed
framework is built upon the integration of association rules mining
and the content-based approach. The results of experiments show
that our proposed framework can provide accurate recommendations
to users.
Abstract: Frequent pattern mining is the process of finding a
pattern (a set of items, subsequences, substructures, etc.) that occurs
frequently in a data set. It was proposed in the context of frequent
itemsets and association rule mining. Frequent pattern mining is used
to find inherent regularities in data. What products were often
purchased together? Its applications include basket data analysis,
cross-marketing, catalog design, sale campaign analysis, Web log
(click stream) analysis, and DNA sequence analysis. However, one of
the bottlenecks of frequent itemset mining is that as the data increase
the amount of time and resources required to mining the data
increases at an exponential rate. In this investigation a new algorithm
is proposed which can be uses as a pre-processor for frequent itemset
mining. FASTER (FeAture SelecTion using Entropy and Rough sets)
is a hybrid pre-processor algorithm which utilizes entropy and roughsets
to carry out record reduction and feature (attribute) selection
respectively. FASTER for frequent itemset mining can produce a
speed up of 3.1 times when compared to original algorithm while
maintaining an accuracy of 71%.
Abstract: In this paper, we present a recommendation library application on Android system. The objective of this system is to support and advice user to use library resources based on mobile application. We describe the design approaches and functional components of this system. The system was developed based on under association rules, Apriori algorithm. In this project, it was divided the result by the research purposes into 2 parts: developing the Mobile application for online library service and testing and evaluating the system. Questionnaires were used to measure user satisfaction with system usability by specialists and users. The results were satisfactory both specialists and users.
Abstract: Associative classification (AC) is a data mining approach that combines association rule and classification to build classification models (classifiers). AC has attracted a significant attention from several researchers mainly because it derives accurate classifiers that contain simple yet effective rules. In the last decade, a number of associative classification algorithms have been proposed such as Classification based Association (CBA), Classification based on Multiple Association Rules (CMAR), Class based Associative Classification (CACA), and Classification based on Predicted Association Rule (CPAR). This paper surveys major AC algorithms and compares the steps and methods performed in each algorithm including: rule learning, rule sorting, rule pruning, classifier building, and class prediction.
Abstract: This research aims to create a model for analysis of student behavior using Library resources based on data mining technique in case of Suan Sunandha Rajabhat University. The model was created under association rules, Apriori algorithm. The results were found 14 rules and the rules were tested with testing data set and it showed that the ability of classify data was 79.24percent and the MSE was 22.91. The results showed that the user’s behavior model by using association rule technique can use to manage the library resources.
Abstract: This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.
Abstract: The exponential increase in the volume of medical image database has imposed new challenges to clinical routine in maintaining patient history, diagnosis, treatment and monitoring. With the advent of data mining and machine learning techniques it is possible to automate and/or assist physicians in clinical diagnosis. In this research a medical image classification framework using data mining techniques is proposed. It involves feature extraction, feature selection, feature discretization and classification. In the classification phase, the performance of the traditional kNN k nearest neighbor classifier is improved using a feature weighting scheme and a distance weighted voting instead of simple majority voting. Feature weights are calculated using the interestingness measures used in association rule mining. Experiments on the retinal fundus images show that the proposed framework improves the classification accuracy of traditional kNN from 78.57 % to 92.85 %.
Abstract: In data mining, the association rules are used to search
for the relations of items of the transactions database. Following the
data is collected and stored, it can find rules of value through
association rules, and assist manager to proceed marketing strategy
and plan market framework. In this paper, we attempt fuzzy partition
methods and decide membership function of quantitative values of
each transaction item. Also, by managers we can reflect the
importance of items as linguistic terms, which are transformed as
fuzzy sets of weights. Next, fuzzy weighted frequent pattern growth
(FWFP-Growth) is used to complete the process of data mining. The
method above is expected to improve Apriori algorithm for its better
efficiency of the whole association rules. An example is given to
clearly illustrate the proposed approach.
Abstract: With a surge of stream processing applications novel
techniques are required for generation and analysis of association
rules in streams. The traditional rule mining solutions cannot handle
streams because they generally require multiple passes over the data
and do not guarantee the results in a predictable, small time. Though
researchers have been proposing algorithms for generation of rules
from streams, there has not been much focus on their analysis.
We propose Association rule profiling, a user centric process for
analyzing association rules and attaching suitable profiles to them
depending on their changing frequency behavior over a previous
snapshot of time in a data stream.
Association rule profiles provide insights into the changing nature
of associations and can be used to characterize the associations. We
discuss importance of characteristics such as predictability of
linkages present in the data and propose metric to quantify it. We
also show how association rule profiles can aid in generation of user
specific, more understandable and actionable rules.
The framework is implemented as SUPAR: System for Usercentric
Profiling of Association Rules in streaming data. The
proposed system offers following capabilities:
i) Continuous monitoring of frequency of streaming item-sets
and detection of significant changes therein for association rule
profiling.
ii) Computation of metrics for quantifying predictability of
associations present in the data.
iii) User-centric control of the characterization process: user
can control the framework through a) constraint specification and b)
non-interesting rule elimination.
Abstract: In this paper the application of rule mining in order to
review the effective factors on supplier selection is reviewed in the
following three sections 1) criteria selecting and information
gathering 2) performing association rule mining 3) validation and
constituting rule base. Afterwards a few of applications of rule base
is explained. Then, a numerical example is presented and analyzed
by Clementine software. Some of extracted rules as well as the
results are presented at the end.