Abstract: Data mining idea is mounting rapidly in admiration
and also in their popularity. The foremost aspire of data mining
method is to extract data from a huge data set into several forms that
could be comprehended for additional use. The data mining is a
technology that contains with rich potential resources which could be
supportive for industries and businesses that pay attention to collect
the necessary information of the data to discover their customer’s
performances. For extracting data there are several methods are
available such as Classification, Clustering, Association,
Discovering, and Visualization… etc., which has its individual and
diverse algorithms towards the effort to fit an appropriate model to
the data. STATISTICA mostly deals with excessive groups of data
that imposes vast rigorous computational constraints. These results
trials challenge cause the emergence of powerful STATISTICA Data
Mining technologies. In this survey an overview of the STATISTICA
software is illustrated along with their significant features.
Abstract: In this paper, we used data mining to extract
biomedical knowledge. In general, complex biomedical data
collected in studies of populations are treated by statistical methods,
although they are robust, they are not sufficient in themselves to
harness the potential wealth of data. For that you used in step two
learning algorithms: the Decision Trees and Support Vector Machine
(SVM). These supervised classification methods are used to make the
diagnosis of thyroid disease. In this context, we propose to promote
the study and use of symbolic data mining techniques.
Abstract: One image is worth more than thousand words.
Images if analyzed can reveal useful information. Low level image
processing deals with the extraction of specific feature from a single
image. Now the question arises: What technique should be used to
extract patterns of very large and detailed image database? The
answer of the question is: “Image Mining”. Image Mining deals with
the extraction of image data relationship, implicit knowledge, and
another pattern from the collection of images or image database. It is
nothing but the extension of Data Mining. In the following paper, not
only we are going to scrutinize the current techniques of image
mining but also present a new technique for mining images using
Genetic Algorithm.
Abstract: Clustering involves the partitioning of n objects into k
clusters. Many clustering algorithms use hard-partitioning techniques
where each object is assigned to one cluster. In this paper we propose
an overlapping algorithm MCOKE which allows objects to belong to
one or more clusters. The algorithm is different from fuzzy clustering
techniques because objects that overlap are assigned a membership
value of 1 (one) as opposed to a fuzzy membership degree. The
algorithm is also different from other overlapping algorithms that
require a similarity threshold be defined a priori which can be
difficult to determine by novice users.
Abstract: Safety is one of the most important considerations
when buying a new car. While active safety aims at avoiding
accidents, passive safety systems such as airbags and seat belts
protect the occupant in case of an accident. In addition to legal
regulations, organizations like Euro NCAP provide consumers with
an independent assessment of the safety performance of cars and
drive the development of safety systems in automobile industry.
Those ratings are mainly based on injury assessment reference values
derived from physical parameters measured in dummies during a car
crash test.
The components and sub-systems of a safety system are designed
to achieve the required restraint performance. Sled tests and other
types of tests are then carried out by car makers and their suppliers
to confirm the protection level of the safety system. A Knowledge
Discovery in Databases (KDD) process is proposed in order to
minimize the number of tests. The KDD process is based on the
data emerging from sled tests according to Euro NCAP specifications.
About 30 parameters of the passive safety systems from different data
sources (crash data, dummy protocol) are first analysed together with
experts opinions. A procedure is proposed to manage missing data
and validated on real data sets. Finally, a procedure is developed to
estimate a set of rough initial parameters of the passive system before
testing aiming at reducing the number of tests.
Abstract: Recommendation systems are widely used in
e-commerce applications. The engine of a current recommendation
system recommends items to a particular user based on user
preferences and previous high ratings. Various recommendation
schemes such as collaborative filtering and content-based approaches
are used to build a recommendation system. Most of current
recommendation systems were developed to fit a certain domain such
as books, articles, and movies. We propose1 a hybrid framework
recommendation system to be applied on two dimensional spaces
(User × Item) with a large number of Users and a small number
of Items. Moreover, our proposed framework makes use of both
favorite and non-favorite items of a particular user. The proposed
framework is built upon the integration of association rules mining
and the content-based approach. The results of experiments show
that our proposed framework can provide accurate recommendations
to users.
Abstract: Existing methods of data mining cannot be applied on
spatial data because they require spatial specificity consideration, as
spatial relationships.
This paper focuses on the classification with decision trees, which
are one of the data mining techniques. We propose an extension of
the C4.5 algorithm for spatial data, based on two different approaches
Join materialization and Querying on the fly the different tables.
Similar works have been done on these two main approaches, the
first - Join materialization - favors the processing time in spite of
memory space, whereas the second - Querying on the fly different
tables- promotes memory space despite of the processing time.
The modified C4.5 algorithm requires three entries tables: a target
table, a neighbor table, and a spatial index join that contains the
possible spatial relationship among the objects in the target table and
those in the neighbor table. Thus, the proposed algorithms are applied
to a spatial data pattern in the accidentology domain.
A comparative study of our approach with other works of
classification by spatial decision trees will be detailed.
Abstract: Nowadays, the Web has become one of the most
pervasive platforms for information change and retrieval. It collects
the suitable and perfectly fitting information from websites that one
requires. Data mining is the form of extracting data’s available in the
internet. Web mining is one of the elements of data mining
Technique, which relates to various research communities such as
information recovery, folder managing system and simulated
intellects. In this Paper we have discussed the concepts of Web
mining. We contain generally focused on one of the categories of
Web mining, specifically the Web Content Mining and its various
farm duties. The mining tools are imperative to scanning the many
images, text, and HTML documents and then, the result is used by
the various search engines. We conclude by presenting a comparative
table of these tools based on some pertinent criteria.
Abstract: GRF, Growth regulating factor, genes encode a novel
class of plant-specific transcription factors. The GRF proteins play a
role in the regulation of cell numbers in young and growing tissues
and may act as transcription activations in growth and development
of plants. Identification of GRF genes and their expression are
important in plants to performance of the growth and development of
various organs. In this study, to better understanding the structural
and functional differences of GRFs family, 45 GRF proteins
sequences in A. thaliana, Z. mays, O. sativa, B. napus, B. rapa, H.
vulgare and S. bicolor, have been collected and analyzed through
bioinformatics data mining. As a result, in secondary structure of
GRFs, the number of alpha helices was more than beta sheets and in
all of them QLQ domains were completely in the biggest alpha helix.
In all GRFs, QLQ and WRC domains were completely protected
except in AtGRF9. These proteins have no trans-membrane domain
and due to have nuclear localization signals act in nuclear and they
are component of unstable proteins in the test tube.
Abstract: An extensive amount of work has been done in data
clustering research under the unsupervised learning technique in Data
Mining during the past two decades. Moreover, several approaches
and methods have been emerged focusing on clustering diverse data
types, features of cluster models and similarity rates of clusters.
However, none of the single clustering algorithm exemplifies its best
nature in extracting efficient clusters. Consequently, in order to
rectify this issue, a new challenging technique called Cluster
Ensemble method was bloomed. This new approach tends to be the
alternative method for the cluster analysis problem. The main
objective of the Cluster Ensemble is to aggregate the diverse
clustering solutions in such a way to attain accuracy and also to
improve the eminence the individual clustering algorithms. Due to
the massive and rapid development of new methods in the globe of
data mining, it is highly mandatory to scrutinize a vital analysis of
existing techniques and the future novelty. This paper shows the
comparative analysis of different cluster ensemble methods along
with their methodologies and salient features. Henceforth this
unambiguous analysis will be very useful for the society of clustering
experts and also helps in deciding the most appropriate one to resolve
the problem in hand.
Abstract: There have been a lot of efforts and researches undertaken in developing efficient tools for performing several tasks in data mining. Due to the massive amount of information embedded in huge data warehouses maintained in several domains, the extraction of meaningful pattern is no longer feasible. This issue turns to be more obligatory for developing several tools in data mining. Furthermore the major aspire of data mining software is to build a resourceful predictive or descriptive model for handling large amount of information more efficiently and user friendly. Data mining mainly contracts with excessive collection of data that inflicts huge rigorous computational constraints. These out coming challenges lead to the emergence of powerful data mining technologies. In this survey a diverse collection of data mining tools are exemplified and also contrasted with the salient features and performance behavior of each tool.
Abstract: Machine learning represents a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. The data can present identification patterns which are used to classify into groups. The result of the analysis is the pattern which can be used for identification of data set without the need to obtain input data used for creation of this pattern. An important requirement in this process is careful data preparation validation of model used and its suitable interpretation. For breeders, it is important to know the origin of animals from the point of the genetic diversity. In case of missing pedigree information, other methods can be used for traceability of animal´s origin. Genetic diversity written in genetic data is holding relatively useful information to identify animals originated from individual countries. We can conclude that the application of data mining for molecular genetic data using supervised learning is an appropriate tool for hypothesis testing and identifying an individual.
Abstract: Creative tourism is an ongoing development in many countries as an attempt to moving away from serial reproduction of culture and reviving the culture. Despite, in the destinations with diverse and potential cultural resources, creating new tourism services can be vague. This paper presents how tourism experiences are modularized and consolidated in order to form new creative tourism service offerings in evanescent cultural communities of Bangkok, Thailand. The benefits from data mining in accommodating value co-creation are discussed, and implication of experience modularization to national creative tourism policy is addressed.
Abstract: Textual data plays an important role in the modern
world. The possibilities of applying data mining techniques to
uncover hidden information present in large volumes of text
collections is immense. The Growing Self Organizing Map (GSOM)
is a highly successful member of the Self Organising Map family
and has been used as a clustering and visualisation tool across wide
range of disciplines to discover hidden patterns present in the data.
A comprehensive analysis of the GSOM’s capabilities as a text
clustering and visualisation tool has so far not been published. These
functionalities, namely map visualisation capabilities, automatic
cluster identification and hierarchical clustering capabilities are
presented in this paper and are further demonstrated with experiments
on a benchmark text corpus.
Abstract: In the era of rapidly increasing notebook market, consumer electronics manufacturers are facing a highly dynamic and competitive environment. In particular, the product appearance is the first part for user to distinguish the product from the product of other brands. Notebook product should differ in its appearance to engage users and contribute to the user experience (UX). The UX evaluates various product concepts to find the design for user needs; in addition, help the designer to further understand the product appearance preference of different market segment. However, few studies have been done for exploring the relationship between consumer background and the reaction of product appearance. This study aims to propose a data mining framework to capture the user’s information and the important relation between product appearance factors. The proposed framework consists of problem definition and structuring, data preparation, rules generation, and results evaluation and interpretation. An empirical study has been done in Taiwan that recruited 168 subjects from different background to experience the appearance performance of 11 different portable computers. The results assist the designers to develop product strategies based on the characteristics of consumers and the product concept that related to the UX, which help to launch the products to the right customers and increase the market shares. The results have shown the practical feasibility of the proposed framework.
Abstract: This paper investigates a new data mining capability that entails mining of High Utility Itemsets (HUI) in a distributed environment. Existing research in data mining deals with only presence or absence of an items and do not consider the semantic measures like weight or cost of the items. Thus, HUI mining algorithm has evolved. HUI mining is the one kind of utility mining concept, aims to identify itemsets whose utility satisfies a given threshold. Although, the approach of mining HUIs in a distributed environment and mining of the same from XML data have not explored yet. In this work, a novel approach is proposed to mine HUIs from the XML based data in a distributed environment. This work utilizes Service Oriented Computing (SOC) paradigm which provides Knowledge as a Service (KaaS). The interesting patterns are provided via the web services with the help of knowledge server to answer the queries of the consumers. The performance of the approach is evaluated on various databases using execution time and memory consumption.
Abstract: The high utilization rate of Automated Teller Machine (ATM) has inevitably caused the phenomena of waiting for a long time in the queue. This in turn has increased the out of stock situations. The ATM utilization helps to determine the usage level and states the necessity of the ATM based on the utilization of the ATM system. The time in which the ATM used more frequently (peak time) and based on the predicted solution the necessary actions are taken by the bank management. The analysis can be done by using the concept of Data Mining and the major part are analyzed based on the predictive data mining. The results are predicted from the historical data (past data) and track the relevant solution which is required. Weka tool is used for the analysis of data based on predictive data mining.
Abstract: Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modeling and testing work was done in R and Greenplum in-database analytic tools.
Abstract: The application of data mining to environmental monitoring has become crucial for a number of tasks related to emergency management. Over recent years, many tools have been developed for decision support system (DSS) for emergency management. In this article a graphical user interface (GUI) for environmental monitoring system is presented. This interface allows accomplishing (i) data collection and observation and (ii) extraction for data mining. This tool may be the basis for future development along the line of the open source software paradigm.
Abstract: Clustering in data mining is an unsupervised learning technique of aggregating the data objects into meaningful groups such that the intra cluster similarity of objects are maximized and inter cluster similarity of objects are minimized. Over the past decades several clustering tools were emerged in which clustering algorithms are inbuilt and are easier to use and extract the expected results. Data mining mainly deals with the huge databases that inflicts on cluster analysis and additional rigorous computational constraints. These challenges pave the way for the emergence of powerful expansive data mining clustering softwares. In this survey, a variety of clustering tools used in data mining are elucidated along with the pros and cons of each software.