Abstract: In this study, a new criterion for determining the number of classes an image should be segmented is proposed. This criterion is based on discriminant analysis for measuring the separability among the segmented classes of pixels. Based on the new discriminant criterion, two algorithms for recursively segmenting the image into determined number of classes are proposed. The proposed methods can automatically and correctly segment objects with various illuminations into separated images for further processing. Experiments on the extraction of text strings from complex document images demonstrate the effectiveness of the proposed methods.1
Abstract: Knowing about the customer behavior in a grocery has
been a long-standing issue in the retailing industry. The advent of
RFID has made it easier to collect moving data for an individual
shopper's behavior. Most of the previous studies used the traditional
statistical clustering technique to find the major characteristics of
customer behavior, especially shopping path. However, in using the
clustering technique, due to various spatial constraints in the store,
standard clustering methods are not feasible because moving data such
as the shopping path should be adjusted in advance of the analysis,
which is time-consuming and causes data distortion. To alleviate this
problem, we propose a new approach to spatial pattern clustering
based on the longest common subsequence. Experimental results using
real data obtained from a grocery confirm the good performance of the
proposed method in finding the hot spot, dead spot and major path
patterns of customer movements.
Abstract: Fuzzy C-means Clustering algorithm (FCM) is a
method that is frequently used in pattern recognition. It has the
advantage of giving good modeling results in many cases, although,
it is not capable of specifying the number of clusters by itself. In
FCM algorithm most researchers fix weighting exponent (m) to a
conventional value of 2 which might not be the appropriate for all
applications. Consequently, the main objective of this paper is to use
the subtractive clustering algorithm to provide the optimal number of
clusters needed by FCM algorithm by optimizing the parameters of
the subtractive clustering algorithm by an iterative search approach
and then to find an optimal weighting exponent (m) for the FCM
algorithm. In order to get an optimal number of clusters, the iterative
search approach is used to find the optimal single-output Sugenotype
Fuzzy Inference System (FIS) model by optimizing the
parameters of the subtractive clustering algorithm that give minimum
least square error between the actual data and the Sugeno fuzzy
model. Once the number of clusters is optimized, then two
approaches are proposed to optimize the weighting exponent (m) in
the FCM algorithm, namely, the iterative search approach and the
genetic algorithms. The above mentioned approach is tested on the
generated data from the original function and optimal fuzzy models
are obtained with minimum error between the real data and the
obtained fuzzy models.
Abstract: A predictive clustering hybrid regression (pCHR)
approach was developed and evaluated using dataset from H2-
producing sucrose-based bioreactor operated for 15 months. The aim
was to model and predict the H2-production rate using information
available about envirome and metabolome of the bioprocess. Selforganizing
maps (SOM) and Sammon map were used to visualize the
dataset and to identify main metabolic patterns and clusters in
bioprocess data. Three metabolic clusters: acetate coupled with other
metabolites, butyrate only, and transition phases were detected. The
developed pCHR model combines principles of k-means clustering,
kNN classification and regression techniques. The model performed
well in modeling and predicting the H2-production rate with mean
square error values of 0.0014 and 0.0032, respectively.
Abstract: Network security attacks are the violation of
information security policy that received much attention to the
computational intelligence society in the last decades. Data mining
has become a very useful technique for detecting network intrusions
by extracting useful knowledge from large number of network data
or logs. Naïve Bayesian classifier is one of the most popular data
mining algorithm for classification, which provides an optimal way
to predict the class of an unknown example. It has been tested that
one set of probability derived from data is not good enough to have
good classification rate. In this paper, we proposed a new learning
algorithm for mining network logs to detect network intrusions
through naïve Bayesian classifier, which first clusters the network
logs into several groups based on similarity of logs, and then
calculates the prior and conditional probabilities for each group of
logs. For classifying a new log, the algorithm checks in which cluster
the log belongs and then use that cluster-s probability set to classify
the new log. We tested the performance of our proposed algorithm by
employing KDD99 benchmark network intrusion detection dataset,
and the experimental results proved that it improves detection rates
as well as reduces false positives for different types of network
intrusions.
Abstract: This paper describes about the process of recognition and classification of brain images such as normal and abnormal based on PSO-SVM. Image Classification is becoming more important for medical diagnosis process. In medical area especially for diagnosis the abnormality of the patient is classified, which plays a great role for the doctors to diagnosis the patient according to the severeness of the diseases. In case of DICOM images it is very tough for optimal recognition and early detection of diseases. Our work focuses on recognition and classification of DICOM image based on collective approach of digital image processing. For optimal recognition and classification Particle Swarm Optimization (PSO), Genetic Algorithm (GA) and Support Vector Machine (SVM) are used. The collective approach by using PSO-SVM gives high approximation capability and much faster convergence.
Abstract: Monitoring the tool flank wear without affecting the
throughput is considered as the prudent method in production
technology. The examination has to be done without affecting the
machining process. In this paper we proposed a novel work that is
used to determine tool flank wear by observing the sound signals
emitted during the turning process. The work-piece material we used
here is steel and aluminum and the cutting insert was carbide
material. Two different cutting speeds were used in this work. The
feed rate and the cutting depth were constant whereas the flank wear
was a variable. The emitted sound signal of a fresh tool (0 mm flank
wear) a slightly worn tool (0.2 -0.25 mm flank wear) and a severely
worn tool (0.4mm and above flank wear) during turning process were
recorded separately using a high sensitive microphone. Analysis
using Singular Value Decomposition was done on these sound
signals to extract the feature sound components. Observation of the
results showed that an increase in tool flank wear correlates with an
increase in the values of SVD features produced out of the sound
signals for both the materials. Hence it can be concluded that wear
monitoring of tool flank during turning process using SVD features
with the Fuzzy C means classification on the emitted sound signal is
a potential and relatively simple method.
Abstract: Prediction of fault-prone modules provides one way to
support software quality engineering. Clustering is used to determine
the intrinsic grouping in a set of unlabeled data. Among various
clustering techniques available in literature K-Means clustering
approach is most widely being used. This paper introduces K-Means
based Clustering approach for software finding the fault proneness of
the Object-Oriented systems. The contribution of this paper is that it
has used Metric values of JEdit open source software for generation
of the rules for the categorization of software modules in the
categories of Faulty and non faulty modules and thereafter
empirically validation is performed. The results are measured in
terms of accuracy of prediction, probability of Detection and
Probability of False Alarms.
Abstract: The weighting exponent m is called the fuzzifier that
can have influence on the clustering performance of fuzzy c-means
(FCM) and mÎ[1.5,2.5] is suggested by Pal and Bezdek [13]. In this
paper, we will discuss the robust properties of FCM and show that the
parameter m will have influence on the robustness of FCM. According
to our analysis, we find that a large m value will make FCM more
robust to noise and outliers. However, if m is larger than the theoretical
upper bound proposed by Yu et al. [14], the sample mean will become
the unique optimizer. Here, we suggest to implement the FCM
algorithm with mÎ[1.5,4] under the restriction when m is smaller
than the theoretical upper bound.
Abstract: In the past few years, the use of wireless sensor networks (WSNs) potentially increased in applications such as intrusion detection, forest fire detection, disaster management and battle field. Sensor nodes are generally battery operated low cost devices. The key challenge in the design and operation of WSNs is to prolong the network life time by reducing the energy consumption among sensor nodes. Node clustering is one of the most promising techniques for energy conservation. This paper presents a novel clustering algorithm which maximizes the network lifetime by reducing the number of communication among sensor nodes. This approach also includes new distributed cluster formation technique that enables self-organization of large number of nodes, algorithm for maintaining constant number of clusters by prior selection of cluster head and rotating the role of cluster head to evenly distribute the energy load among all sensor nodes.
Abstract: Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC – Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as є – distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.
Abstract: Due to the limited energy resources, energy efficient operation of sensor node is a key issue in wireless sensor networks. Clustering is an effective method to prolong the lifetime of energy constrained wireless sensor network. However, clustering in wireless sensor network faces several challenges such as selection of an optimal group of sensor nodes as cluster, optimum selection of cluster head, energy balanced optimal strategy for rotating the role of cluster head in a cluster, maintaining intra and inter cluster connectivity and optimal data routing in the network. In this paper, we propose a protocol supporting an energy efficient clustering, cluster head selection/rotation and data routing method to prolong the lifetime of sensor network. Simulation results demonstrate that the proposed protocol prolongs network lifetime due to the use of efficient clustering, cluster head selection/rotation and data routing.
Abstract: K-Means (KM) is considered one of the major
algorithms widely used in clustering. However, it still has some
problems, and one of them is in its initialization step where it is
normally done randomly. Another problem for KM is that it
converges to local minima. Genetic algorithms are one of the
evolutionary algorithms inspired from nature and utilized in the field
of clustering. In this paper, we propose two algorithms to solve the
initialization problem, Genetic Algorithm Initializes KM (GAIK) and
KM Initializes Genetic Algorithm (KIGA). To show the effectiveness
and efficiency of our algorithms, a comparative study was done
among GAIK, KIGA, Genetic-based Clustering Algorithm (GCA),
and FCM [19].
Abstract: Data gathering is an essential operation in wireless
sensor network applications. So it requires energy efficiency
techniques to increase the lifetime of the network. Similarly,
clustering is also an effective technique to improve the energy
efficiency and network lifetime of wireless sensor networks. In this
paper, an energy efficient cluster formation protocol is proposed with
the objective of achieving low energy dissipation and latency without
sacrificing application specific quality. The objective is achieved by
applying randomized, adaptive, self-configuring cluster formation
and localized control for data transfers. It involves application -
specific data processing, such as data aggregation or compression.
The cluster formation algorithm allows each node to make
independent decisions, so as to generate good clusters as the end.
Simulation results show that the proposed protocol utilizes minimum
energy and latency for cluster formation, there by reducing the
overhead of the protocol.
Abstract: Data clustering is an important data exploration
technique with many applications in data mining. The k-means
algorithm is well known for its efficiency in clustering large data
sets. However, this algorithm is suitable for spherical shaped clusters
of similar sizes and densities. The quality of the resulting clusters
decreases when the data set contains spherical shaped with large
variance in sizes. In this paper, we introduce a competent procedure
to overcome this problem. The proposed method is based on shifting
the center of the large cluster toward the small cluster, and recomputing
the membership of small cluster points, the experimental
results reveal that the proposed algorithm produces satisfactory
results.
Abstract: In terms of total online audience, newspapers are the most successful form of online content to date. The online audience for newspapers continues to demand higher-quality services, including personalized news services. News providers should be able to offer suitable users appropriate content. In this paper, a news article recommender system is suggested based on a user-s preference when he or she visits an Internet news site and reads the published articles. This system helps raise the user-s satisfaction, increase customer loyalty toward the content provider.
Abstract: Searching similar documents and document
management subjects have important place in text mining. One of the
most important parts of similar document research studies is the
process of classifying or clustering the documents. In this study, a
similar document search approach that includes discussion of out the
case of belonging to multiple categories (multiple categories
problem) has been carried. The proposed method that based on Fuzzy
Similarity Classification (FSC) has been compared with Rocchio
algorithm and naive Bayes method which are widely used in text
mining. Empirical results show that the proposed method is quite
successful and can be applied effectively. For the second stage,
multiple categories vector method based on information of categories
regarding to frequency of being seen together has been used.
Empirical results show that achievement is increased almost two
times, when proposed method is compared with classical approach.
Abstract: Microarrays have become the effective, broadly used tools in biological and medical research to address a wide range of problems, including classification of disease subtypes and tumors. Many statistical methods are available for analyzing and systematizing these complex data into meaningful information, and one of the main goals in analyzing gene expression data is the detection of samples or genes with similar expression patterns. In this paper, we express and compare the performance of several clustering methods based on data preprocessing including strategies of normalization or noise clearness. We also evaluate each of these clustering methods with validation measures for both simulated data and real gene expression data. Consequently, clustering methods which are common used in microarray data analysis are affected by normalization and degree of noise and clearness for datasets.
Abstract: Microarrays technique allows the simultaneous measurements of the expression levels of thousands of mRNAs. By mining this data one can identify the dynamics of the gene expression time series. By recourse of principal component analysis, we uncover the circadian rhythmic patterns underlying the gene expression profiles from Cyanobacterium Synechocystis. We applied PCA to reduce the dimensionality of the data set. Examination of the components also provides insight into the underlying factors measured in the experiments. Our results suggest that all rhythmic content of data can be reduced to three main components.
Abstract: Clustering unstructured text documents is an
important issue in data mining community and has a number of
applications such as document archive filtering, document
organization and topic detection and subject tracing. In the real
world, some of the already clustered documents may not be of
importance while new documents of more significance may evolve.
Most of the work done so far in clustering unstructured text
documents overlooks this aspect of clustering. This paper, addresses
this issue by using the Fading Function. The unstructured text
documents are clustered. And for each cluster a statistics structure
called Cluster Profile (CP) is implemented. The cluster profile
incorporates the Fading Function. This Fading Function keeps an
account of the time-dependent importance of the cluster. The work
proposes a novel algorithm Clustering n-ary Merge Algorithm
(CnMA) for unstructured text documents, that uses Cluster Profile
and Fading Function. Experimental results illustrating the
effectiveness of the proposed technique are also included.