Abstract: In the present study, the self-organizing map (SOM) clustering technique was applied to identify homogeneous clusters of hydrochemical parameters in El Milia plain, Algeria, to assess the quality of groundwater for potable and agricultural purposes. The visualization of SOM-analysis indicated that 35 groundwater samples collected in the study area were classified into three clusters, which showed progressive increase in electrical conductivity from cluster one to cluster three. Samples belonging to cluster one are mostly located in the recharge zone showing hard fresh water type, however, water type gradually changed to hard-brackish type in the discharge zone, including clusters two and three. Ionic ratio studies indicated the role of carbonate rock dissolution in increases on groundwater hardness, especially in cluster one. However, evaporation and evapotranspiration are the main processes increasing salinity in cluster two and three.
Abstract: Given the increase in the number of e-commerce sites,
the number of competitors has become very important. This means
that companies have to take appropriate decisions in order to meet the
expectations of their customers and satisfy their needs. In this paper,
we present a case study of applying LRFM (length, recency,
frequency and monetary) model and clustering techniques in the
sector of electronic commerce with a view to evaluating customers’
values of the Moroccan e-commerce websites and then developing
effective marketing strategies. To achieve these objectives, we adopt
LRFM model by applying a two-stage clustering method. In the first
stage, the self-organizing maps method is used to determine the best
number of clusters and the initial centroid. In the second stage, kmeans
method is applied to segment 730 customers into nine clusters
according to their L, R, F and M values. The results show that the
cluster 6 is the most important cluster because the average values of
L, R, F and M are higher than the overall average value. In addition,
this study has considered another variable that describes the mode of
payment used by customers to improve and strengthen clusters’
analysis. The clusters’ analysis demonstrates that the payment method is
one of the key indicators of a new index which allows to assess the
level of customers’ confidence in the company's Website.
Abstract: The paper presents the results of clusterization by
Kohonen self-organizing maps (SOM) applied for analysis of array of
Raman spectra of multi-component solutions of inorganic salts, for
determination of types of salts present in the solution. It is
demonstrated that use of SOM is a promising method for solution of
clusterization and classification problems in spectroscopy of multicomponent
objects, as attributing a pattern to some cluster may be
used for recognition of component composition of the object.
Abstract: We present a non standard Euclidean vehicle
routing problem adding a level of clustering, and we revisit the use
of self-organizing maps as a tool which naturally handles such
problems. We present how they can be used as a main operator
into an evolutionary algorithm to address two conflicting
objectives of route length and distance from customers to bus stops
minimization and to deal with capacity constraints. We apply the
approach to a real-life case of combined clustering and vehicle
routing for the transportation of the 780 employees of an
enterprise. Basing upon a geographic information system we
discuss the influence of road infrastructures on the solutions
generated.
Abstract: Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.
Abstract: Expression data analysis is based mostly on the
statistical approaches that are indispensable for the study of
biological systems. Large amounts of multidimensional data resulting
from the high-throughput technologies are not completely served by
biostatistical techniques and are usually complemented with visual,
knowledge discovery and other computational tools. In many cases,
in biological systems we only speculate on the processes that are
causing the changes, and it is the visual explorative analysis of data
during which a hypothesis is formed. We would like to show the
usability of multidimensional visualization tools and promote their
use in life sciences. We survey and show some of the
multidimensional visualization tools in the process of data
exploration, such as parallel coordinates and radviz and we extend
them by combining them with the self-organizing map algorithm. We
use a time course data set of transitional cell carcinoma of the bladder
in our examples. Analysis of data with these tools has the potential to
uncover additional relationships and non-trivial structures.
Abstract: Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.
Abstract: Self-organizing map (SOM) provides both clustering and visualization capabilities in mining data. Dynamic self-organizing maps such as Growing Self-organizing Map (GSOM) has been developed to overcome the problem of fixed structure in SOM to enable better representation of the discovered patterns. However, in mining large datasets or historical data the hierarchical structure of the data is also useful to view the cluster formation at different levels of abstraction. In this paper, we present a technique to generate concept trees from the GSOM. The formation of tree from different spread factor values of GSOM is also investigated and the quality of the trees analyzed. The results show that concept trees can be generated from GSOM, thus, eliminating the need for re-clustering of the data from scratch to obtain a hierarchical view of the data under study.
Abstract: Self-organizing map (SOM) is a well known data
reduction technique used in data mining. It can reveal structure in
data sets through data visualization that is otherwise hard to detect
from raw data alone. However, interpretation through visual
inspection is prone to errors and can be very tedious. There are
several techniques for the automatic detection of clusters of code
vectors found by SOM, but they generally do not take into account
the distribution of code vectors; this may lead to unsatisfactory
clustering and poor definition of cluster boundaries, particularly
where the density of data points is low. In this paper, we propose the
use of an adaptive heuristic particle swarm optimization (PSO)
algorithm for finding cluster boundaries directly from the code
vectors obtained from SOM. The application of our method to
several standard data sets demonstrates its feasibility. PSO algorithm
utilizes a so-called U-matrix of SOM to determine cluster boundaries;
the results of this novel automatic method compare very favorably to
boundary detection through traditional algorithms namely k-means
and hierarchical based approach which are normally used to interpret
the output of SOM.
Abstract: We report on the results of a pilot study in which a data-mining tool was developed for mining audiology records. The records were heterogeneous in that they contained numeric, category and textual data. The tools developed are designed to observe associations between any field in the records and any other field. The techniques employed were the statistical chi-squared test, and the use of self-organizing maps, an unsupervised neural learning approach.
Abstract: Integration of system process information obtained
through an image processing system with an evolving knowledge
database to improve the accuracy and predictability of wear debris
analysis is the main focus of the paper. The objective is to automate
intelligently the analysis process of wear particle using classification
via self-organizing maps. This is achieved using relationship
measurements among corresponding attributes of various
measurements for wear debris. Finally, visualization technique is
proposed that helps the viewer in understanding and utilizing these
relationships that enable accurate diagnostics.