Abstract: Given the increase in the number of e-commerce sites,
the number of competitors has become very important. This means
that companies have to take appropriate decisions in order to meet the
expectations of their customers and satisfy their needs. In this paper,
we present a case study of applying LRFM (length, recency,
frequency and monetary) model and clustering techniques in the
sector of electronic commerce with a view to evaluating customers’
values of the Moroccan e-commerce websites and then developing
effective marketing strategies. To achieve these objectives, we adopt
LRFM model by applying a two-stage clustering method. In the first
stage, the self-organizing maps method is used to determine the best
number of clusters and the initial centroid. In the second stage, kmeans
method is applied to segment 730 customers into nine clusters
according to their L, R, F and M values. The results show that the
cluster 6 is the most important cluster because the average values of
L, R, F and M are higher than the overall average value. In addition,
this study has considered another variable that describes the mode of
payment used by customers to improve and strengthen clusters’
analysis. The clusters’ analysis demonstrates that the payment method is
one of the key indicators of a new index which allows to assess the
level of customers’ confidence in the company's Website.
Abstract: Leukaemia is a blood cancer disease that contributes
to the increment of mortality rate in Malaysia each year. There are
two main categories for leukaemia, which are acute and chronic
leukaemia. The production and development of acute leukaemia cells
occurs rapidly and uncontrollable. Therefore, if the identification of
acute leukaemia cells could be done fast and effectively, proper
treatment and medicine could be delivered. Due to the requirement of
prompt and accurate diagnosis of leukaemia, the current study has
proposed unsupervised pixel segmentation based on clustering
algorithm in order to obtain a fully segmented abnormal white blood
cell (blast) in acute leukaemia image. In order to obtain the
segmented blast, the current study proposed three clustering
algorithms which are k-means, fuzzy c-means and moving k-means
algorithms have been applied on the saturation component image.
Then, median filter and seeded region growing area extraction
algorithms have been applied, to smooth the region of segmented
blast and to remove the large unwanted regions from the image,
respectively. Comparisons among the three clustering algorithms are
made in order to measure the performance of each clustering
algorithm on segmenting the blast area. Based on the good sensitivity
value that has been obtained, the results indicate that moving kmeans
clustering algorithm has successfully produced the fully
segmented blast region in acute leukaemia image. Hence, indicating
that the resultant images could be helpful to haematologists for
further analysis of acute leukaemia.
Abstract: An automated wood recognition system is designed to
classify tropical wood species.The wood features are extracted based
on two feature extractors: Basic Grey Level Aura Matrix (BGLAM)
technique and statistical properties of pores distribution (SPPD)
technique. Due to the nonlinearity of the tropical wood species
separation boundaries, a pre classification stage is proposed which
consists ofKmeans clusteringand kernel discriminant analysis (KDA).
Finally, Linear Discriminant Analysis (LDA) classifier and KNearest
Neighbour (KNN) are implemented for comparison purposes.
The study involves comparison of the system with and without pre
classification using KNN classifier and LDA classifier.The results
show that the inclusion of the pre classification stage has improved
the accuracy of both the LDA and KNN classifiers by more than
12%.
Abstract: This paper presents a comparative analysis of a new
unsupervised PCA-based technique for steel plates texture segmentation
towards defect detection. The proposed scheme called Variance
Based Component Analysis or VBCA employs PCA for feature
extraction, applies a feature reduction algorithm based on variance of
eigenpictures and classifies the pixels as defective and normal. While
the classic PCA uses a clusterer like Kmeans for pixel clustering,
VBCA employs thresholding and some post processing operations to
label pixels as defective and normal. The experimental results show
that proposed algorithm called VBCA is 12.46% more accurate and
78.85% faster than the classic PCA.
Abstract: A new dynamic clustering approach (DCPSO), based
on Particle Swarm Optimization, is proposed. This approach is
applied to unsupervised image classification. The proposed approach
automatically determines the "optimum" number of clusters and
simultaneously clusters the data set with minimal user interference.
The algorithm starts by partitioning the data set into a relatively large
number of clusters to reduce the effects of initial conditions. Using
binary particle swarm optimization the "best" number of clusters is
selected. The centers of the chosen clusters is then refined via the Kmeans
clustering algorithm. The experiments conducted show that
the proposed approach generally found the "optimum" number of
clusters on the tested images.
Abstract: Image clustering is a process of grouping images
based on their similarity. The image clustering usually uses the color
component, texture, edge, shape, or mixture of two components, etc.
This research aims to explore image clustering using color
composition. In order to complete this image clustering, three main
components should be considered, which are color space, image
representation (feature extraction), and clustering method itself. We
aim to explore which composition of these factors will produce the
best clustering results by combining various techniques from the
three components. The color spaces use RGB, HSV, and L*a*b*
method. The image representations use Histogram and Gaussian
Mixture Model (GMM), whereas the clustering methods use KMeans
and Agglomerative Hierarchical Clustering algorithm. The
results of the experiment show that GMM representation is better
combined with RGB and L*a*b* color space, whereas Histogram is
better combined with HSV. The experiments also show that K-Means
is better than Agglomerative Hierarchical for images clustering.
Abstract: In literature, there are metrics for identifying the
quality of reusable components but the framework that makes use of
these metrics to precisely predict reusability of software components
is still need to be worked out. These reusability metrics if identified
in the design phase or even in the coding phase can help us to reduce
the rework by improving quality of reuse of the software component
and hence improve the productivity due to probabilistic increase in
the reuse level. As CK metric suit is most widely used metrics for
extraction of structural features of an object oriented (OO) software;
So, in this study, tuned CK metric suit i.e. WMC, DIT, NOC, CBO
and LCOM, is used to obtain the structural analysis of OO-based
software components. An algorithm has been proposed in which the
inputs can be given to K-Means Clustering system in form of
tuned values of the OO software component and decision tree is
formed for the 10-fold cross validation of data to evaluate the in
terms of linguistic reusability value of the component. The developed
reusability model has produced high precision results as desired.
Abstract: Color Image quantization (CQ) is an important
problem in computer graphics, image and processing. The aim of
quantization is to reduce colors in an image with minimum distortion.
Clustering is a widely used technique for color quantization; all
colors in an image are grouped to small clusters. In this paper, we
proposed a new hybrid approach for color quantization using firefly
algorithm (FA) and K-means algorithm. Firefly algorithm is a swarmbased
algorithm that can be used for solving optimization problems.
The proposed method can overcome the drawbacks of both
algorithms such as the local optima converge problem in K-means
and the early converge of firefly algorithm. Experiments on three
commonly used images and the comparison results shows that the
proposed algorithm surpasses both the base-line technique k-means
clustering and original firefly algorithm.
Abstract: This paper uses the radial basis function neural
network (RBFNN) for system identification of nonlinear systems.
Five nonlinear systems are used to examine the activity of RBFNN in
system modeling of nonlinear systems; the five nonlinear systems are
dual tank system, single tank system, DC motor system, and two
academic models. The feed forward method is considered in this
work for modelling the non-linear dynamic models, where the KMeans
clustering algorithm used in this paper to select the centers of
radial basis function network, because it is reliable, offers fast
convergence and can handle large data sets. The least mean square
method is used to adjust the weights to the output layer, and
Euclidean distance method used to measure the width of the Gaussian
function.
Abstract: This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.
Abstract: In this paper, a novel algorithm based on Ridgelet
Transform and support vector machine is proposed for human action
recognition. The Ridgelet transform is a directional multi-resolution
transform and it is more suitable for describing the human action by
performing its directional information to form spatial features
vectors. The dynamic transition between the spatial features is carried
out using both the Principal Component Analysis and clustering
algorithm K-means. First, the Principal Component Analysis is used
to reduce the dimensionality of the obtained vectors. Then, the kmeans
algorithm is then used to perform the obtained vectors to form
the spatio-temporal pattern, called set-of-labels, according to given
periodicity of human action. Finally, a Support Machine classifier is
used to discriminate between the different human actions. Different
tests are conducted on popular Datasets, such as Weizmann and
KTH. The obtained results show that the proposed method provides
more significant accuracy rate and it drives more robustness in very
challenging situations such as lighting changes, scaling and dynamic
environment
Abstract: Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.
Abstract: This paper presents a new approach for image
segmentation by applying Pillar-Kmeans algorithm. This
segmentation process includes a new mechanism for clustering the
elements of high-resolution images in order to improve precision and
reduce computation time. The system applies K-means clustering to
the image segmentation after optimized by Pillar Algorithm. The
Pillar algorithm considers the pillars- placement which should be
located as far as possible from each other to withstand against the
pressure distribution of a roof, as identical to the number of centroids
amongst the data distribution. This algorithm is able to optimize the
K-means clustering for image segmentation in aspects of precision
and computation time. It designates the initial centroids- positions
by calculating the accumulated distance metric between each data
point and all previous centroids, and then selects data points which
have the maximum distance as new initial centroids. This algorithm
distributes all initial centroids according to the maximum
accumulated distance metric. This paper evaluates the proposed
approach for image segmentation by comparing with K-means and
Gaussian Mixture Model algorithm and involving RGB, HSV, HSL
and CIELAB color spaces. The experimental results clarify the
effectiveness of our approach to improve the segmentation quality in
aspects of precision and computational time.
Abstract: In this study a clustering technique has been implemented which is K-Means like with hierarchical initial set (HKM). The goal of this study is to prove that clustering document sets do enhancement precision on information retrieval systems, since it was proved by Bellot & El-Beze on French language. A comparison is made between the traditional information retrieval system and the clustered one. Also the effect of increasing number of clusters on precision is studied. The indexing technique is Term Frequency * Inverse Document Frequency (TF * IDF). It has been found that the effect of Hierarchical K-Means Like clustering (HKM) with 3 clusters over 242 Arabic abstract documents from the Saudi Arabian National Computer Conference has significant results compared with traditional information retrieval system without clustering. Additionally it has been found that it is not necessary to increase the number of clusters to improve precision more.