Abstract: Initial values of reference vectors have significant influence on recognition accuracy in LVQ. There are several existing techniques, such as SOM and k-means, for setting initial values of reference vectors, each of which has provided some positive results. However, those results are not sufficient for the improvement of recognition accuracy. This study proposes an ACO-used method for initializing reference vectors with an aim to achieve recognition accuracy higher than those obtained through conventional methods. Moreover, we will demonstrate the effectiveness of the proposed method by applying it to the wine data and English vowel data and comparing its results with those of conventional methods.
Abstract: Data mining uses a variety of techniques each of which is useful for some particular task. It is important to have a deep understanding of each technique and be able to perform sophisticated analysis. In this article we describe a tool built to simulate a variation of the Kohonen network to perform unsupervised clustering and support the entire data mining process up to results visualization. A graphical representation helps the user to find out a strategy to optmize classification by adding, moving or delete a neuron in order to change the number of classes. The tool is also able to automatically suggest a strategy for number of classes optimization.The tool is used to classify macroeconomic data that report the most developed countries? import and export. It is possible to classify the countries based on their economic behaviour and use an ad hoc tool to characterize the commercial behaviour of a country in a selected class from the analysis of positive and negative features that contribute to classes formation.
Abstract: A traffic light gives security from traffic congestion,reducing the traffic jam, and organizing the traffic flow. Furthermore,increasing congestion level in public road networks is a growingproblem in many countries. Using Intelligent Transportation Systemsto provide emergency vehicles a green light at intersections canreduce driver confusion, reduce conflicts, and improve emergencyresponse times. Nowadays, the technology of wireless sensornetworks can solve many problems and can offer a good managementof the crossroad. In this paper, we develop a new approach based onthe technique of clustering and the graphical possibilistic fusionmodeling. So, the proposed model is elaborated in three phases. Thefirst one consists to decompose the environment into clusters,following by the fusion intra and inter clusters processes. Finally, wewill show some experimental results by simulation that proves theefficiency of our proposed approach.KeywordsTraffic light, Wireless sensor network, Controller,Possibilistic network/Bayesain network.
Abstract: There are several approaches for handling multiclass classification. Aside from one-against-one (OAO) and one-against-all (OAA), hierarchical classification technique is also commonly used. A binary classification tree is a hierarchical classification structure that breaks down a k-class problem into binary sub-problems, each solved by a binary classifier. In each node, a set of classes is divided into two subsets. A good class partition should be able to group similar classes together. Many algorithms measure similarity in term of distance between class centroids. Classes are grouped together by a clustering algorithm when distances between their centroids are small. In this paper, we present a binary classification tree with tuned observation-based clustering (BCT-TOB) that finds a class partition by performing clustering on observations instead of class centroids. A merging step is introduced to merge any insignificant class split. The experiment shows that performance of BCT-TOB is comparable to other algorithms.
Abstract: It is important problems to increase the detection rates
and reduce false positive rates in Intrusion Detection System (IDS).
Although preventative techniques such as access control and
authentication attempt to prevent intruders, these can fail, and as a
second line of defence, intrusion detection has been introduced. Rare
events are events that occur very infrequently, detection of rare
events is a common problem in many domains. In this paper we
propose an intrusion detection method that combines Rough set and
Fuzzy Clustering. Rough set has to decrease the amount of data and
get rid of redundancy. Fuzzy c-means clustering allow objects to
belong to several clusters simultaneously, with different degrees of
membership. Our approach allows us to recognize not only known
attacks but also to detect suspicious activity that may be the result of
a new, unknown attack. The experimental results on Knowledge
Discovery and Data Mining-(KDDCup 1999) Dataset show that the
method is efficient and practical for intrusion detection systems.
Abstract: This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.
Abstract: Segmentation and quantification of stenosis is an
important task in assessing coronary artery disease. One of the main
challenges is measuring the real diameter of curved vessels.
Moreover, uncertainty in segmentation of different tissues in the
narrow vessel is an important issue that affects accuracy. This paper
proposes an algorithm to extract coronary arteries and measure the
degree of stenosis. Markovian fuzzy clustering method is applied to
model uncertainty arises from partial volume effect problem. The
algorithm employs: segmentation, centreline extraction, estimation of
orthogonal plane to centreline, measurement of the degree of
stenosis. To evaluate the accuracy and reproducibility, the approach
has been applied to a vascular phantom and the results are compared
with real diameter. The results of 10 patient datasets have been
visually judged by a qualified radiologist. The results reveal the
superiority of the proposed method compared to the Conventional
thresholding Method (CTM) on both datasets.
Abstract: This study introduces a new method for detecting,
sorting, and localizing spikes from multiunit EEG recordings. The
method combines the wavelet transform, which localizes distinctive
spike features, with Super-Paramagnetic Clustering (SPC) algorithm,
which allows automatic classification of the data without assumptions
such as low variance or Gaussian distributions. Moreover, the method
is capable of setting amplitude thresholds for spike detection. The
method makes use of several real EEG data sets, and accordingly the
spikes are detected, clustered and their times were detected.
Abstract: Tofurther advance research on immune-related genes
from T. molitor, we constructed acDNA library and analyzed
expressed sequence taq (EST) sequences from 1,056 clones. After
removing vector sequence and quality checkingthrough thePhred
program (trim_alt 0.05 (P-score>20), 1039 sequences were generated.
The average length of insert was 792 bp. In addition, we identified 162
clusters, 167 contigs and 391 contigs after clustering and assembling
process using a TGICL package. EST sequences were searchedagainst
NCBI nr database by local BLAST (blastx, E
Abstract: A computationally simple approach of model order
reduction for single input single output (SISO) and linear timeinvariant
discrete systems modeled in frequency domain is proposed
in this paper. Denominator of the reduced order model is determined
using fuzzy C-means clustering while the numerator parameters are
found by matching time moments and Markov parameters of high
order system.
Abstract: Most of fuzzy clustering algorithms have some
discrepancies, e.g. they are not able to detect clusters with convex
shapes, the number of the clusters should be a priori known, they
suffer from numerical problems, like sensitiveness to the
initialization, etc. This paper studies the synergistic combination of
the hierarchical and graph theoretic minimal spanning tree based
clustering algorithm with the partitional Gath-Geva fuzzy clustering
algorithm. The aim of this hybridization is to increase the robustness
and consistency of the clustering results and to decrease the number
of the heuristically defined parameters of these algorithms to
decrease the influence of the user on the clustering results. For the
analysis of the resulted fuzzy clusters a new fuzzy similarity measure
based tool has been presented. The calculated similarities of the
clusters can be used for the hierarchical clustering of the resulted
fuzzy clusters, which information is useful for cluster merging and
for the visualization of the clustering results. As the examples used
for the illustration of the operation of the new algorithm will show,
the proposed algorithm can detect clusters from data with arbitrary
shape and does not suffer from the numerical problems of the
classical Gath-Geva fuzzy clustering algorithm.
Abstract: Hand gesture is an active area of research in the vision
community, mainly for the purpose of sign language recognition and
Human Computer Interaction. In this paper, we propose a system to
recognize alphabet characters (A-Z) and numbers (0-9) in real-time
from stereo color image sequences using Hidden Markov Models
(HMMs). Our system is based on three main stages; automatic segmentation
and preprocessing of the hand regions, feature extraction
and classification. In automatic segmentation and preprocessing stage,
color and 3D depth map are used to detect hands where the hand
trajectory will take place in further step using Mean-shift algorithm
and Kalman filter. In the feature extraction stage, 3D combined features
of location, orientation and velocity with respected to Cartesian
systems are used. And then, k-means clustering is employed for
HMMs codeword. The final stage so-called classification, Baum-
Welch algorithm is used to do a full train for HMMs parameters.
The gesture of alphabets and numbers is recognized using Left-Right
Banded model in conjunction with Viterbi algorithm. Experimental
results demonstrate that, our system can successfully recognize hand
gestures with 98.33% recognition rate.
Abstract: In this paper, a novel algorithm based on Ridgelet
Transform and support vector machine is proposed for human action
recognition. The Ridgelet transform is a directional multi-resolution
transform and it is more suitable for describing the human action by
performing its directional information to form spatial features
vectors. The dynamic transition between the spatial features is carried
out using both the Principal Component Analysis and clustering
algorithm K-means. First, the Principal Component Analysis is used
to reduce the dimensionality of the obtained vectors. Then, the kmeans
algorithm is then used to perform the obtained vectors to form
the spatio-temporal pattern, called set-of-labels, according to given
periodicity of human action. Finally, a Support Machine classifier is
used to discriminate between the different human actions. Different
tests are conducted on popular Datasets, such as Weizmann and
KTH. The obtained results show that the proposed method provides
more significant accuracy rate and it drives more robustness in very
challenging situations such as lighting changes, scaling and dynamic
environment
Abstract: The vast amount of information hidden in huge
databases has created tremendous interests in the field of data
mining. This paper examines the possibility of using data clustering
techniques in oral medicine to identify functional relationships
between different attributes and classification of similar patient
examinations. Commonly used data clustering algorithms have been
reviewed and as a result several interesting results have been
gathered.
Abstract: Lung cancer accounts for the most cancer related deaths for men as well as for women. The identification of cancer associated genes and the related pathways are essential to provide an important possibility in the prevention of many types of cancer. In this work two filter approaches, namely the information gain and the biomarker identifier (BMI) are used for the identification of different types of small-cell and non-small-cell lung cancer. A new method to determine the BMI thresholds is proposed to prioritize genes (i.e., primary, secondary and tertiary) using a k-means clustering approach. Sets of key genes were identified that can be found in several pathways. It turned out that the modified BMI is well suited for microarray data and therefore BMI is proposed as a powerful tool for the search for new and so far undiscovered genes related to cancer.
Abstract: Despite the relatively large number of studies that
have examined the use of appeals in advertisements, research on the
use of appeals in green advertisements is still underdeveloped and
needs to be investigated further, as it is definitely a tool for marketers
to create illustrious ads. In this study, content analysis was employed
to examine the nature of green advertising appeals and to match the
appeals with the green advertisements. Two different types of green
print advertisings, product orientation and organizational image
orientation were used. Thirty highly educated participants with
different backgrounds were asked individually to ascertain three
appeals out of thirty-four given appeals found among forty real green
advertisements. To analyze participant responses and to group them
based on common appeals, two-step K-mean clustering is used. The
clustering solution indicates that eye-catching graphics and
imaginative appeals are highly notable in both types of green ads.
Depressed, meaningful and sad appeals are found to be highly used in
organizational image orientation ads, whereas, corporate image,
informative and natural appeals are found to be essential for product
orientation ads.
Abstract: This study proposes novel hybrid social network analysis and collaborative filtering approach to enhance the performance of recommender systems. The proposed model selects subgroups of users in Internet community through social network analysis (SNA), and then performs clustering analysis using the information about subgroups. Finally, it makes recommendations using cluster-indexing CF based on the clustering results. This study tries to use the cores in subgroups as an initial seed for a conventional clustering algorithm. This model chooses five cores which have the highest value of degree centrality from SNA, and then performs clustering analysis by using the cores as initial centroids (cluster centers). Then, the model amplifies the impact of friends in social network in the process of cluster-indexing CF.
Abstract: The interdependences among stock market indices
were studied for a long while by academics in the entire world. The
current financial crisis opened the door to a wide range of opinions
concerning the understanding and measurement of the connections
considered to provide the controversial phenomenon of market
integration. Using data on the log-returns of 17 stock market indices
that include most of the CEE markets, from 2005 until 2009, our
paper studies the problem of these dependences using a new
methodological tool that takes into account both the volatility
clustering effect and the stochastic properties of these linkages
through a Dynamic Conditional System of Simultaneous Equations.
We find that the crisis is well captured by our model as it provides
evidence for the high volatility – high dependence effect.
Abstract: Clustering is one of an interesting data mining topics
that can be applied in many fields. Recently, the problem of cluster
analysis is formulated as a problem of nonsmooth, nonconvex optimization,
and an algorithm for solving the cluster analysis problem
based on nonsmooth optimization techniques is developed. This
optimization problem has a number of characteristics that make it
challenging: it has many local minimum, the optimization variables
can be either continuous or categorical, and there are no exact
analytical derivatives. In this study we show how to apply a particular
class of optimization methods known as pattern search methods
to address these challenges. These methods do not explicitly use
derivatives, an important feature that has not been addressed in
previous studies. Results of numerical experiments are presented
which demonstrate the effectiveness of the proposed method.
Abstract: The Cluster Dimension of a network is defined as, which is the minimum cardinality of a subset S of the set of nodes having the property that for any two distinct nodes x and y, there exist the node Si, s2 (need not be distinct) in S such that ld(x,s1) — d(y, s1)1 > 1 and d(x,s2) < d(x,$) for all s E S — {s2}. In this paper, strictly non overlap¬ping clusters are constructed. The concept of LandMarks for Unique Addressing and Clustering (LMUAC) routing scheme is developed. With the help of LMUAC routing scheme, It is shown that path length (upper bound)PLN,d < PLD, Maximum memory space requirement for the networkMSLmuAc(Az) < MSEmuAc < MSH3L < MSric and Maximum Link utilization factor MLLMUAC(i=3) < MLLMUAC(z03) < M Lc