Abstract: In this paper, a clustering algorithm named KHarmonic
means (KHM) was employed in the training of Radial
Basis Function Networks (RBFNs). KHM organized the data in
clusters and determined the centres of the basis function. The popular
clustering algorithms, namely K-means (KM) and Fuzzy c-means
(FCM), are highly dependent on the initial identification of elements
that represent the cluster well. In KHM, the problem can be avoided.
This leads to improvement in the classification performance when
compared to other clustering algorithms. A comparison of the
classification accuracy was performed between KM, FCM and KHM.
The classification performance is based on the benchmark data sets:
Iris Plant, Diabetes and Breast Cancer. RBFN training with the KHM
algorithm shows better accuracy in classification problem.
Abstract: Intelligent systems based on machine learning
techniques, such as classification, clustering, are gaining wide spread
popularity in real world applications. This paper presents work on
developing a software system for predicting crop yield, for example
oil-palm yield, from climate and plantation data. At the core of our
system is a method for unsupervised partitioning of data for finding
spatio-temporal patterns in climate data using kernel methods which
offer strength to deal with complex data. This work gets inspiration
from the notion that a non-linear data transformation into some high
dimensional feature space increases the possibility of linear
separability of the patterns in the transformed space. Therefore, it
simplifies exploration of the associated structure in the data. Kernel
methods implicitly perform a non-linear mapping of the input data
into a high dimensional feature space by replacing the inner products
with an appropriate positive definite function. In this paper we
present a robust weighted kernel k-means algorithm incorporating
spatial constraints for clustering the data. The proposed algorithm
can effectively handle noise, outliers and auto-correlation in the
spatial data, for effective and efficient data analysis by exploring
patterns and structures in the data, and thus can be used for
predicting oil-palm yield by analyzing various factors affecting the
yield.
Abstract: The paper presents a complete discrete statistical framework, based on a novel vector quantization (VQ) front-end process. This new VQ approach performs an optimal distribution of VQ codebook components on HMM states. This technique that we named the distributed vector quantization (DVQ) of hidden Markov models, succeeds in unifying acoustic micro-structure and phonetic macro-structure, when the estimation of HMM parameters is performed. The DVQ technique is implemented through two variants. The first variant uses the K-means algorithm (K-means- DVQ) to optimize the VQ, while the second variant exploits the benefits of the classification behavior of neural networks (NN-DVQ) for the same purpose. The proposed variants are compared with the HMM-based baseline system by experiments of specific Arabic consonants recognition. The results show that the distributed vector quantization technique increase the performance of the discrete HMM system.
Abstract: Initial values of reference vectors have significant influence on recognition accuracy in LVQ. There are several existing techniques, such as SOM and k-means, for setting initial values of reference vectors, each of which has provided some positive results. However, those results are not sufficient for the improvement of recognition accuracy. This study proposes an ACO-used method for initializing reference vectors with an aim to achieve recognition accuracy higher than those obtained through conventional methods. Moreover, we will demonstrate the effectiveness of the proposed method by applying it to the wine data and English vowel data and comparing its results with those of conventional methods.
Abstract: This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.
Abstract: Hand gesture is an active area of research in the vision
community, mainly for the purpose of sign language recognition and
Human Computer Interaction. In this paper, we propose a system to
recognize alphabet characters (A-Z) and numbers (0-9) in real-time
from stereo color image sequences using Hidden Markov Models
(HMMs). Our system is based on three main stages; automatic segmentation
and preprocessing of the hand regions, feature extraction
and classification. In automatic segmentation and preprocessing stage,
color and 3D depth map are used to detect hands where the hand
trajectory will take place in further step using Mean-shift algorithm
and Kalman filter. In the feature extraction stage, 3D combined features
of location, orientation and velocity with respected to Cartesian
systems are used. And then, k-means clustering is employed for
HMMs codeword. The final stage so-called classification, Baum-
Welch algorithm is used to do a full train for HMMs parameters.
The gesture of alphabets and numbers is recognized using Left-Right
Banded model in conjunction with Viterbi algorithm. Experimental
results demonstrate that, our system can successfully recognize hand
gestures with 98.33% recognition rate.
Abstract: In this paper, a novel algorithm based on Ridgelet
Transform and support vector machine is proposed for human action
recognition. The Ridgelet transform is a directional multi-resolution
transform and it is more suitable for describing the human action by
performing its directional information to form spatial features
vectors. The dynamic transition between the spatial features is carried
out using both the Principal Component Analysis and clustering
algorithm K-means. First, the Principal Component Analysis is used
to reduce the dimensionality of the obtained vectors. Then, the kmeans
algorithm is then used to perform the obtained vectors to form
the spatio-temporal pattern, called set-of-labels, according to given
periodicity of human action. Finally, a Support Machine classifier is
used to discriminate between the different human actions. Different
tests are conducted on popular Datasets, such as Weizmann and
KTH. The obtained results show that the proposed method provides
more significant accuracy rate and it drives more robustness in very
challenging situations such as lighting changes, scaling and dynamic
environment
Abstract: Lung cancer accounts for the most cancer related deaths for men as well as for women. The identification of cancer associated genes and the related pathways are essential to provide an important possibility in the prevention of many types of cancer. In this work two filter approaches, namely the information gain and the biomarker identifier (BMI) are used for the identification of different types of small-cell and non-small-cell lung cancer. A new method to determine the BMI thresholds is proposed to prioritize genes (i.e., primary, secondary and tertiary) using a k-means clustering approach. Sets of key genes were identified that can be found in several pathways. It turned out that the modified BMI is well suited for microarray data and therefore BMI is proposed as a powerful tool for the search for new and so far undiscovered genes related to cancer.
Abstract: Despite the relatively large number of studies that
have examined the use of appeals in advertisements, research on the
use of appeals in green advertisements is still underdeveloped and
needs to be investigated further, as it is definitely a tool for marketers
to create illustrious ads. In this study, content analysis was employed
to examine the nature of green advertising appeals and to match the
appeals with the green advertisements. Two different types of green
print advertisings, product orientation and organizational image
orientation were used. Thirty highly educated participants with
different backgrounds were asked individually to ascertain three
appeals out of thirty-four given appeals found among forty real green
advertisements. To analyze participant responses and to group them
based on common appeals, two-step K-mean clustering is used. The
clustering solution indicates that eye-catching graphics and
imaginative appeals are highly notable in both types of green ads.
Depressed, meaningful and sad appeals are found to be highly used in
organizational image orientation ads, whereas, corporate image,
informative and natural appeals are found to be essential for product
orientation ads.
Abstract: Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.
Abstract: Documents clustering become an essential technology
with the popularity of the Internet. That also means that fast and
high-quality document clustering technique play core topics. Text
clustering or shortly clustering is about discovering semantically
related groups in an unstructured collection of documents. Clustering
has been very popular for a long time because it provides unique
ways of digesting and generalizing large amounts of information.
One of the issues of clustering is to extract proper feature (concept)
of a problem domain. The existing clustering technology mainly
focuses on term weight calculation. To achieve more accurate
document clustering, more informative features including concept
weight are important. Feature Selection is important for clustering
process because some of the irrelevant or redundant feature may
misguide the clustering results. To counteract this issue, the proposed
system presents the concept weight for text clustering system
developed based on a k-means algorithm in accordance with the
principles of ontology so that the important of words of a cluster can
be identified by the weight values. To a certain extent, it has resolved
the semantic problem in specific areas.
Abstract: Clustering in high dimensional space is a difficult
problem which is recurrent in many fields of science and
engineering, e.g., bioinformatics, image processing, pattern
reorganization and data mining. In high dimensional space some of
the dimensions are likely to be irrelevant, thus hiding the possible
clustering. In very high dimensions it is common for all the objects in
a dataset to be nearly equidistant from each other, completely
masking the clusters. Hence, performance of the clustering algorithm
decreases.
In this paper, we propose an algorithmic framework which
combines the (reduct) concept of rough set theory with the k-means
algorithm to remove the irrelevant dimensions in a high dimensional
space and obtain appropriate clusters. Our experiment on test data
shows that this framework increases efficiency of the clustering
process and accuracy of the results.
Abstract: Recently, a quality of motors is inspected by human
ears. In this paper, I propose two systems using a method of speech
recognition for automation of the inspection. The first system is based
on a method of linear processing which uses K-means and Nearest
Neighbor method, and the second is based on a method of non-linear
processing which uses neural networks. I used motor sounds in these
systems, and I successfully recognize 86.67% of motor sounds in the
linear processing system and 97.78% in the non-linear processing
system.
Abstract: This paper presents a new approach for image
segmentation by applying Pillar-Kmeans algorithm. This
segmentation process includes a new mechanism for clustering the
elements of high-resolution images in order to improve precision and
reduce computation time. The system applies K-means clustering to
the image segmentation after optimized by Pillar Algorithm. The
Pillar algorithm considers the pillars- placement which should be
located as far as possible from each other to withstand against the
pressure distribution of a roof, as identical to the number of centroids
amongst the data distribution. This algorithm is able to optimize the
K-means clustering for image segmentation in aspects of precision
and computation time. It designates the initial centroids- positions
by calculating the accumulated distance metric between each data
point and all previous centroids, and then selects data points which
have the maximum distance as new initial centroids. This algorithm
distributes all initial centroids according to the maximum
accumulated distance metric. This paper evaluates the proposed
approach for image segmentation by comparing with K-means and
Gaussian Mixture Model algorithm and involving RGB, HSV, HSL
and CIELAB color spaces. The experimental results clarify the
effectiveness of our approach to improve the segmentation quality in
aspects of precision and computational time.
Abstract: The prediction of Software quality during development life cycle of software project helps the development organization to make efficient use of available resource to produce the product of highest quality. “Whether a module is faulty or not" approach can be used to predict quality of a software module. There are numbers of software quality prediction models described in the literature based upon genetic algorithms, artificial neural network and other data mining algorithms. One of the promising aspects for quality prediction is based on clustering techniques. Most quality prediction models that are based on clustering techniques make use of K-means, Mixture-of-Guassians, Self-Organizing Map, Neural Gas and fuzzy K-means algorithm for prediction. In all these techniques a predefined structure is required that is number of neurons or clusters should be known before we start clustering process. But in case of Growing Neural Gas there is no need of predetermining the quantity of neurons and the topology of the structure to be used and it starts with a minimal neurons structure that is incremented during training until it reaches a maximum number user defined limits for clusters. Hence, in this work we have used Growing Neural Gas as underlying cluster algorithm that produces the initial set of labeled cluster from training data set and thereafter this set of clusters is used to predict the quality of test data set of software modules. The best testing results shows 80% accuracy in evaluating the quality of software modules. Hence, the proposed technique can be used by programmers in evaluating the quality of modules during software development.
Abstract: In this study a clustering technique has been implemented which is K-Means like with hierarchical initial set (HKM). The goal of this study is to prove that clustering document sets do enhancement precision on information retrieval systems, since it was proved by Bellot & El-Beze on French language. A comparison is made between the traditional information retrieval system and the clustered one. Also the effect of increasing number of clusters on precision is studied. The indexing technique is Term Frequency * Inverse Document Frequency (TF * IDF). It has been found that the effect of Hierarchical K-Means Like clustering (HKM) with 3 clusters over 242 Arabic abstract documents from the Saudi Arabian National Computer Conference has significant results compared with traditional information retrieval system without clustering. Additionally it has been found that it is not necessary to increase the number of clusters to improve precision more.