Abstract: Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.
Abstract: In this paper, a new K-means clustering based
approach for identification of voltage control areas is developed.
Voltage control areas are important for efficient reactive power
management in power systems operating under deregulated
environment. Although, voltage control areas are formed using
conventional hierarchical clustering based method, but the present
paper investigate the capability of K-means clustering for the
purpose of forming voltage control areas. The proposed method is
tested and compared for IEEE 14 bus and IEEE 30 bus systems. The
results show that this K-means based method is competing with
conventional hierarchical approach
Abstract: An end-member selection method for spectral unmixing that is based on Particle Swarm Optimization (PSO) is developed in this paper. The algorithm uses the K-means clustering algorithm and a method of dynamic selection of end-members subsets to find the appropriate set of end-members for a given set of multispectral images. The proposed algorithm has been successfully applied to test image sets from various platforms such as LANDSAT 5 MSS and NOAA's AVHRR. The experimental results of the proposed algorithm are encouraging. The influence of different values of the algorithm control parameters on performance is studied. Furthermore, the performance of different versions of PSO is also investigated.
Abstract: In this paper, we propose an algorithm to compute
initial cluster centers for K-means clustering. Data in a cell is
partitioned using a cutting plane that divides cell in two smaller cells.
The plane is perpendicular to the data axis with the highest variance
and is designed to reduce the sum squared errors of the two cells as
much as possible, while at the same time keep the two cells far apart
as possible. Cells are partitioned one at a time until the number of
cells equals to the predefined number of clusters, K. The centers of
the K cells become the initial cluster centers for K-means. The
experimental results suggest that the proposed algorithm is effective,
converge to better clustering results than those of the random
initialization method. The research also indicated the proposed
algorithm would greatly improve the likelihood of every cluster
containing some data in it.
Abstract: In the present work, we propose a new technique to
enhance the learning capabilities and reduce the computation
intensity of a competitive learning multi-layered neural network
using the K-means clustering algorithm. The proposed model use
multi-layered network architecture with a back propagation learning
mechanism. The K-means algorithm is first applied to the training
dataset to reduce the amount of samples to be presented to the neural
network, by automatically selecting an optimal set of samples. The
obtained results demonstrate that the proposed technique performs
exceptionally in terms of both accuracy and computation time when
applied to the KDD99 dataset compared to a standard learning
schema that use the full dataset.
Abstract: The multiple traveling salesman problem (mTSP) can be used to model many practical problems. The mTSP is more complicated than the traveling salesman problem (TSP) because it requires determining which cities to assign to each salesman, as well as the optimal ordering of the cities within each salesman's tour. Previous studies proposed that Genetic Algorithm (GA), Integer Programming (IP) and several neural network (NN) approaches could be used to solve mTSP. This paper compared the results for mTSP, solved with Genetic Algorithm (GA) and Nearest Neighbor Algorithm (NNA). The number of cities is clustered into a few groups using k-means clustering technique. The number of groups depends on the number of salesman. Then, each group is solved with NNA and GA as an independent TSP. It is found that k-means clustering and NNA are superior to GA in terms of performance (evaluated by fitness function) and computing time.
Abstract: Geographic Profiling has successfully assisted investigations for serial crimes. Considering the multi-cluster feature of serial criminal spots, we propose a Multi-point Centrography model as a natural extension of Single-point Centrography for geographic profiling. K-means clustering is first performed on the data samples and then Single-point Centrography is adopted to derive a probability distribution on each cluster. Finally, a weighted combinations of each distribution is formed to make next-crime spot prediction. Experimental study on real cases demonstrates the effectiveness of our proposed model.
Abstract: This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.
Abstract: Hand gesture is an active area of research in the vision
community, mainly for the purpose of sign language recognition and
Human Computer Interaction. In this paper, we propose a system to
recognize alphabet characters (A-Z) and numbers (0-9) in real-time
from stereo color image sequences using Hidden Markov Models
(HMMs). Our system is based on three main stages; automatic segmentation
and preprocessing of the hand regions, feature extraction
and classification. In automatic segmentation and preprocessing stage,
color and 3D depth map are used to detect hands where the hand
trajectory will take place in further step using Mean-shift algorithm
and Kalman filter. In the feature extraction stage, 3D combined features
of location, orientation and velocity with respected to Cartesian
systems are used. And then, k-means clustering is employed for
HMMs codeword. The final stage so-called classification, Baum-
Welch algorithm is used to do a full train for HMMs parameters.
The gesture of alphabets and numbers is recognized using Left-Right
Banded model in conjunction with Viterbi algorithm. Experimental
results demonstrate that, our system can successfully recognize hand
gestures with 98.33% recognition rate.
Abstract: Lung cancer accounts for the most cancer related deaths for men as well as for women. The identification of cancer associated genes and the related pathways are essential to provide an important possibility in the prevention of many types of cancer. In this work two filter approaches, namely the information gain and the biomarker identifier (BMI) are used for the identification of different types of small-cell and non-small-cell lung cancer. A new method to determine the BMI thresholds is proposed to prioritize genes (i.e., primary, secondary and tertiary) using a k-means clustering approach. Sets of key genes were identified that can be found in several pathways. It turned out that the modified BMI is well suited for microarray data and therefore BMI is proposed as a powerful tool for the search for new and so far undiscovered genes related to cancer.
Abstract: This paper presents a new approach for image
segmentation by applying Pillar-Kmeans algorithm. This
segmentation process includes a new mechanism for clustering the
elements of high-resolution images in order to improve precision and
reduce computation time. The system applies K-means clustering to
the image segmentation after optimized by Pillar Algorithm. The
Pillar algorithm considers the pillars- placement which should be
located as far as possible from each other to withstand against the
pressure distribution of a roof, as identical to the number of centroids
amongst the data distribution. This algorithm is able to optimize the
K-means clustering for image segmentation in aspects of precision
and computation time. It designates the initial centroids- positions
by calculating the accumulated distance metric between each data
point and all previous centroids, and then selects data points which
have the maximum distance as new initial centroids. This algorithm
distributes all initial centroids according to the maximum
accumulated distance metric. This paper evaluates the proposed
approach for image segmentation by comparing with K-means and
Gaussian Mixture Model algorithm and involving RGB, HSV, HSL
and CIELAB color spaces. The experimental results clarify the
effectiveness of our approach to improve the segmentation quality in
aspects of precision and computational time.