Abstract: In this paper we use exponential particle swarm
optimization (EPSO) to cluster data. Then we compare between
(EPSO) clustering algorithm which depends on exponential variation
for the inertia weight and particle swarm optimization (PSO)
clustering algorithm which depends on linear inertia weight. This
comparison is evaluated on five data sets. The experimental results
show that EPSO clustering algorithm increases the possibility to find
the optimal positions as it decrease the number of failure. Also show
that (EPSO) clustering algorithm has a smaller quantization error
than (PSO) clustering algorithm, i.e. (EPSO) clustering algorithm
more accurate than (PSO) clustering algorithm.
Abstract: Modern times call organizations to have an active role
in the social arena, through Corporate Social Responsibility (CSR).
The objective of this research was to test the hypothesis that there is a
positive relation between social performance and economic
performance, and if there is a positive correlation between social
performance and financial-economic performance. To test these
theories a measure of social performance, based on the Green Book
of Commission of the European Community, was used in a group of
nineteen Portuguese top companies, listed on the PSI 20 index,
through a period of five years, since 2005 to 2009. A clusters
analysis was applied to group companies by their social performance
and to compare and correlate their economic performance. Results
indicate that companies that had a better social performance are not
the ones who had a better economic performance, and suggest that
the middle path might provide a good relation CSR-Economic
performance, as a basis to a sustainable development.
Abstract: Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.
Abstract: The self-organizing map (SOM) model is a well-known neural network model with wide spread of applications. The main characteristics of SOM are two-fold, namely dimension reduction and topology preservation. Using SOM, a high-dimensional data space will be mapped to some low-dimensional space. Meanwhile, the topological relations among data will be preserved. With such characteristics, the SOM was usually applied on data clustering and visualization tasks. However, the SOM has main disadvantage of the need to know the number and structure of neurons prior to training, which are difficult to be determined. Several schemes have been proposed to tackle such deficiency. Examples are growing/expandable SOM, hierarchical SOM, and growing hierarchical SOM. These schemes could dynamically expand the map, even generate hierarchical maps, during training. Encouraging results were reported. Basically, these schemes adapt the size and structure of the map according to the distribution of training data. That is, they are data-driven or dataoriented SOM schemes. In this work, a topic-oriented SOM scheme which is suitable for document clustering and organization will be developed. The proposed SOM will automatically adapt the number as well as the structure of the map according to identified topics. Unlike other data-oriented SOMs, our approach expands the map and generates the hierarchies both according to the topics and their characteristics of the neurons. The preliminary experiments give promising result and demonstrate the plausibility of the method.
Abstract: In this paper, we present a new algorithm for clustering data in large datasets using image processing approaches. First the dataset is mapped into a binary image plane. The synthesized image is then processed utilizing efficient image processing techniques to cluster the data in the dataset. Henceforth, the algorithm avoids exhaustive search to identify clusters. The algorithm considers only a small set of the data that contains critical boundary information sufficient to identify contained clusters. Compared to available data clustering techniques, the proposed algorithm produces similar quality results and outperforms them in execution time and storage requirements.
Abstract: The colors of the human skin represent a special
category of colors, because they are distinctive from the colors of
other natural objects. This category is found as a cluster in color
spaces, and the skin color variations between people are mostly due
to differences in the intensity. Besides, the face detection based on
skin color detection is a faster method as compared to other
techniques. In this work, we present a system to track faces by
carrying out skin color detection in four different color spaces: HSI,
YCbCr, YES and RGB. Once some skin color regions have been
detected for each color space, we label each and get some
characteristics such as size and position. We are supposing that a face
is located in one the detected regions. Next, we compare and employ
a polling strategy between labeled regions to determine the final
region where the face effectively has been detected and located.
Abstract: The Chiu-s method which generates a Takagi-Sugeno Fuzzy Inference System (FIS) is a method of fuzzy rules extraction. The rules output is a linear function of inputs. In addition, these rules are not explicit for the expert. In this paper, we develop a method which generates Mamdani FIS, where the rules output is fuzzy. The method proceeds in two steps: first, it uses the subtractive clustering principle to estimate both the number of clusters and the initial locations of a cluster centers. Each obtained cluster corresponds to a Mamdani fuzzy rule. Then, it optimizes the fuzzy model parameters by applying a genetic algorithm. This method is illustrated on a traffic network management application. We suggest also a Mamdani fuzzy rules generation method, where the expert wants to classify the output variables in some fuzzy predefined classes.
Abstract: Sense-antisense gene pair (SAGP) is a pair of two oppositely transcribed genes sharing a common region on a chromosome. In the mammalian genomes, SAGPs can be organized in more complex sense-antisense gene architectures (CSAGA) in which at least one gene could share loci with two or more antisense partners. Many dozens of CSAGAs can be found in the human genome. However, CSAGAs have not been systematically identified and characterized in context of their role in human diseases including cancers. In this work we characterize the structural-functional properties of a cluster of 5 genes –TMEM97, IFT20, TNFAIP1, POLDIP2 and TMEM199, termed TNFAIP1 / POLDIP2 module. This cluster is organized as CSAGA in cytoband 17q11.2. Affymetrix U133A&B expression data of two large cohorts (410 atients, in total) of breast cancer patients and patient survival data were used. For the both studied cohorts, we demonstrate (i) strong and reproducible transcriptional co-regulatory patterns of genes of TNFAIP1/POLDIP2 module in breast cancer cell subtypes and (ii) significant associations of TNFAIP1/POLDIP2 CSAGA with amplification of the CSAGA region in breast cancer, (ii) cancer aggressiveness (e.g. genetic grades) and (iv) disease free patient-s survival. Moreover, gene pairs of this module demonstrate strong synergetic effect in the prognosis of time of breast cancer relapse. We suggest that TNFAIP1/ POLDIP2 cluster can be considered as a novel type of structural-functional gene modules in the human genome.
Abstract: In this paper, a clustering algorithm named KHarmonic
means (KHM) was employed in the training of Radial
Basis Function Networks (RBFNs). KHM organized the data in
clusters and determined the centres of the basis function. The popular
clustering algorithms, namely K-means (KM) and Fuzzy c-means
(FCM), are highly dependent on the initial identification of elements
that represent the cluster well. In KHM, the problem can be avoided.
This leads to improvement in the classification performance when
compared to other clustering algorithms. A comparison of the
classification accuracy was performed between KM, FCM and KHM.
The classification performance is based on the benchmark data sets:
Iris Plant, Diabetes and Breast Cancer. RBFN training with the KHM
algorithm shows better accuracy in classification problem.
Abstract: A clustering based technique has been developed and implemented for Short Term Load Forecasting, in this article. Formulation has been done using Mean Absolute Percentage Error (MAPE) as an objective function. Data Matrix and cluster size are optimization variables. Model designed, uses two temperature variables. This is compared with six input Radial Basis Function Neural Network (RBFNN) and Fuzzy Inference Neural Network (FINN) for the data of the same system, for same time period. The fuzzy inference system has the network structure and the training procedure of a neural network which initially creates a rule base from existing historical load data. It is observed that the proposed clustering based model is giving better forecasting accuracy as compared to the other two methods. Test results also indicate that the RBFNN can forecast future loads with accuracy comparable to that of proposed method, where as the training time required in the case of FINN is much less.
Abstract: There are several approaches for handling multiclass classification. Aside from one-against-one (OAO) and one-against-all (OAA), hierarchical classification technique is also commonly used. A binary classification tree is a hierarchical classification structure that breaks down a k-class problem into binary sub-problems, each solved by a binary classifier. In each node, a set of classes is divided into two subsets. A good class partition should be able to group similar classes together. Many algorithms measure similarity in term of distance between class centroids. Classes are grouped together by a clustering algorithm when distances between their centroids are small. In this paper, we present a binary classification tree with tuned observation-based clustering (BCT-TOB) that finds a class partition by performing clustering on observations instead of class centroids. A merging step is introduced to merge any insignificant class split. The experiment shows that performance of BCT-TOB is comparable to other algorithms.
Abstract: This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.
Abstract: This paper presents the benchmarking results and
performance evaluation of differentclustersbuilt atthe National Center
for High-Performance Computingin Taiwan. Performance of
processor, memory subsystem andinterconnect is a critical factor in the
overall performance of high performance computing platforms. The
evaluation compares different system architecture and software
platforms. Most supercomputer used HPL to benchmark their system
performance, in accordance with the requirement of the TOP500 List.
In this paper we consider system memory access factors that affect
benchmark performance, such as processor and memory
performance.We hope these works will provide useful information for
future development and construct cluster system.
Abstract: The vast amount of information hidden in huge
databases has created tremendous interests in the field of data
mining. This paper examines the possibility of using data clustering
techniques in oral medicine to identify functional relationships
between different attributes and classification of similar patient
examinations. Commonly used data clustering algorithms have been
reviewed and as a result several interesting results have been
gathered.
Abstract: The Cluster Dimension of a network is defined as, which is the minimum cardinality of a subset S of the set of nodes having the property that for any two distinct nodes x and y, there exist the node Si, s2 (need not be distinct) in S such that ld(x,s1) — d(y, s1)1 > 1 and d(x,s2) < d(x,$) for all s E S — {s2}. In this paper, strictly non overlap¬ping clusters are constructed. The concept of LandMarks for Unique Addressing and Clustering (LMUAC) routing scheme is developed. With the help of LMUAC routing scheme, It is shown that path length (upper bound)PLN,d < PLD, Maximum memory space requirement for the networkMSLmuAc(Az) < MSEmuAc < MSH3L < MSric and Maximum Link utilization factor MLLMUAC(i=3) < MLLMUAC(z03) < M Lc
Abstract: In this paper we used data mining techniques to
identify outlier patients who are using large amount of drugs over a
long period of time. Any healthcare or health insurance system
should deal with the quantities of drugs utilized by chronic diseases
patients. In Kingdom of Bahrain, about 20% of health budget is spent
on medications. For the managers of healthcare systems, there is no
enough information about the ways of drug utilization by chronic
diseases patients, is there any misuse or is there outliers patients. In
this work, which has been done in cooperation with information
department in the Bahrain Defence Force hospital; we select the data
for Cardiac patients in the period starting from 1/1/2008 to
December 31/12/2008 to be the data for the model in this paper. We
used three techniques for finding the drug utilization for cardiac
patients. First we applied a clustering technique, followed by
measuring of clustering validity, and finally we applied a decision
tree as classification algorithm. The clustering results is divided into
three clusters according to the drug utilization, for 1603 patients, who
received 15,806 prescriptions during this period can be partitioned
into three groups, where 23 patients (2.59%) who received 1316
prescriptions (8.32%) are classified to be outliers. The classification
algorithm shows that the use of average drug utilization and the age,
and the gender of the patient can be considered to be the main
predictive factors in the induced model.
Abstract: Terminal localization for indoor Wireless Local Area
Networks (WLANs) is critical for the deployment of location-aware
computing inside of buildings. A major challenge is obtaining high
localization accuracy in presence of fluctuations of the received signal
strength (RSS) measurements caused by multipath fading. This paper
focuses on reducing the effect of the distance-varying noise by spatial
filtering of the measured RSS. Two different survey point geometries
are tested with the noise reduction technique: survey points arranged
in sets of clusters and survey points uniformly distributed over the
network area. The results show that the location accuracy improves
by 16% when the filter is used and by 18% when the filter is applied
to a clustered survey set as opposed to a straight-line survey set.
The estimated locations are within 2 m of the true location, which
indicates that clustering the survey points provides better localization
accuracy due to superior noise removal.
Abstract: Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.
Abstract: Documents clustering become an essential technology
with the popularity of the Internet. That also means that fast and
high-quality document clustering technique play core topics. Text
clustering or shortly clustering is about discovering semantically
related groups in an unstructured collection of documents. Clustering
has been very popular for a long time because it provides unique
ways of digesting and generalizing large amounts of information.
One of the issues of clustering is to extract proper feature (concept)
of a problem domain. The existing clustering technology mainly
focuses on term weight calculation. To achieve more accurate
document clustering, more informative features including concept
weight are important. Feature Selection is important for clustering
process because some of the irrelevant or redundant feature may
misguide the clustering results. To counteract this issue, the proposed
system presents the concept weight for text clustering system
developed based on a k-means algorithm in accordance with the
principles of ontology so that the important of words of a cluster can
be identified by the weight values. To a certain extent, it has resolved
the semantic problem in specific areas.
Abstract: Clustering in high dimensional space is a difficult
problem which is recurrent in many fields of science and
engineering, e.g., bioinformatics, image processing, pattern
reorganization and data mining. In high dimensional space some of
the dimensions are likely to be irrelevant, thus hiding the possible
clustering. In very high dimensions it is common for all the objects in
a dataset to be nearly equidistant from each other, completely
masking the clusters. Hence, performance of the clustering algorithm
decreases.
In this paper, we propose an algorithmic framework which
combines the (reduct) concept of rough set theory with the k-means
algorithm to remove the irrelevant dimensions in a high dimensional
space and obtain appropriate clusters. Our experiment on test data
shows that this framework increases efficiency of the clustering
process and accuracy of the results.