Abstract: In this paper a one-dimension Self Organizing Map
algorithm (SOM) to perform feature selection is presented. The
algorithm is based on a first classification of the input dataset on a
similarity space. From this classification for each class a set of
positive and negative features is computed. This set of features is
selected as result of the procedure. The procedure is evaluated on an
in-house dataset from a Knowledge Discovery from Text (KDT)
application and on a set of publicly available datasets used in
international feature selection competitions. These datasets come
from KDT applications, drug discovery as well as other applications.
The knowledge of the correct classification available for the training
and validation datasets is used to optimize the parameters for positive
and negative feature extractions. The process becomes feasible for
large and sparse datasets, as the ones obtained in KDT applications,
by using both compression techniques to store the similarity matrix
and speed up techniques of the Kohonen algorithm that take
advantage of the sparsity of the input matrix. These improvements
make it feasible, by using the grid, the application of the
methodology to massive datasets.
Abstract: In this study, a network quality of service (QoS)
evaluation system was proposed. The system used a combination of
fuzzy C-means (FCM) and regression model to analyse and assess the
QoS in a simulated network. Network QoS parameters of multimedia
applications were intelligently analysed by FCM clustering
algorithm. The QoS parameters for each FCM cluster centre were
then inputted to a regression model in order to quantify the overall
QoS. The proposed QoS evaluation system provided valuable
information about the network-s QoS patterns and based on this
information, the overall network-s QoS was effectively quantified.
Abstract: Fuzzy Cognitive Maps (FCMs) have successfully
been applied in numerous domains to show relations between
essential components. In some FCM, there are more nodes, which
related to each other and more nodes means more complex in system
behaviors and analysis. In this paper, a novel learning method used to
construct FCMs based on historical data and by using data mining
and DEMATEL method, a new method defined to reduce nodes
number. This method cluster nodes in FCM based on their cause and
effect behaviors.
Abstract: Most of the biclustering/projected clustering algorithms are based either on the Euclidean distance or correlation coefficient which capture only linear relationships. However, in many applications, like gene expression data and word-document data, non linear relationships may exist between the objects. Mutual Information between two variables provides a more general criterion to investigate dependencies amongst variables. In this paper, we improve upon our previous algorithm that uses mutual information for biclustering in terms of computation time and also the type of clusters identified. The algorithm is able to find biclusters with mixed relationships and is faster than the previous one. To the best of our knowledge, none of the other existing algorithms for biclustering have used mutual information as a similarity measure. We present the experimental results on synthetic data as well as on the yeast expression data. Biclusters on the yeast data were found to be biologically and statistically significant using GO Tool Box and FuncAssociate.
Abstract: Mobile ad-hoc networks (MANETs) are a form of
wireless networks which do not require a base station for providing
network connectivity. Mobile ad-hoc networks have many
characteristics which distinguish them from other wireless networks
which make routing in such networks a challenging task. Cluster
based routing is one of the routing schemes for MANETs in which
various clusters of mobile nodes are formed with each cluster having
its own clusterhead which is responsible for routing among clusters.
In this paper we have proposed and implemented a distributed
weighted clustering algorithm for MANETs. This approach is based
on combined weight metric that takes into account several system
parameters like the node degree, transmission range, energy and
mobility of the nodes. We have evaluated the performance of
proposed scheme through simulation in various network situations.
Simulation results show that proposed scheme outperforms the
original distributed weighted clustering algorithm (DWCA).
Abstract: Today, money laundering (ML) poses a serious threat
not only to financial institutions but also to the nation. This criminal
activity is becoming more and more sophisticated and seems to have
moved from the cliché of drug trafficking to financing terrorism and
surely not forgetting personal gain. Most international financial
institutions have been implementing anti-money laundering solutions
(AML) to fight investment fraud. However, traditional investigative
techniques consume numerous man-hours. Recently, data mining
approaches have been developed and are considered as well-suited
techniques for detecting ML activities. Within the scope of a
collaboration project for the purpose of developing a new solution for
the AML Units in an international investment bank, we proposed a
data mining-based solution for AML. In this paper, we present a
heuristics approach to improve the performance for this solution. We
also show some preliminary results associated with this method on
analysing transaction datasets.
Abstract: Color image segmentation can be considered as a
cluster procedure in feature space. k-means and its adaptive
version, i.e. competitive learning approach are powerful tools
for data clustering. But k-means and competitive learning suffer
from several drawbacks such as dead-unit problem and need to
pre-specify number of cluster. In this paper, we will explore to
use competitive and cooperative learning approach to perform
color image segmentation. In competitive and cooperative
learning approach, seed points not only compete each other, but
also the winner will dynamically select several nearest
competitors to form a cooperative team to adapt to the input
together, finally it can automatically select the correct number
of cluster and avoid the dead-units problem. Experimental
results show that CCL can obtain better segmentation result.
Abstract: The Neuro-Fuzzy hybridization scheme has become
of research interest in pattern classification over the past decade. The
present paper proposes a novel Modified Adaptive Fuzzy Inference
Engine (MAFIE) for pattern classification. A modified Apriori
algorithm technique is utilized to reduce a minimal set of decision
rules based on input output data sets. A TSK type fuzzy inference
system is constructed by the automatic generation of membership
functions and rules by the fuzzy c-means clustering and Apriori
algorithm technique, respectively. The generated adaptive fuzzy
inference engine is adjusted by the least-squares fit and a conjugate
gradient descent algorithm towards better performance with a
minimal set of rules. The proposed MAFIE is able to reduce the
number of rules which increases exponentially when more input
variables are involved. The performance of the proposed MAFIE is
compared with other existing applications of pattern classification
schemes using Fisher-s Iris and Wisconsin breast cancer data sets and
shown to be very competitive.
Abstract: A Decision Support System/Expert System for stock
portfolio selection presented where at first step, both technical and
fundamental data used to estimate technical and fundamental return
and risk (1st phase); Then, the estimated values are aggregated with
the investor preferences (2nd phase) to produce convenient stock
portfolio.
In the 1st phase, there are two expert systems, each of which is
responsible for technical or fundamental estimation. In the technical
expert system, for each stock, twenty seven candidates are identified
and with using rough sets-based clustering method (RC) the effective
variables have been selected. Next, for each stock two fuzzy rulebases
are developed with fuzzy C-Mean method and Takai-Sugeno-
Kang (TSK) approach; one for return estimation and the other for
risk. Thereafter, the parameters of the rule-bases are tuned with backpropagation
method. In parallel, for fundamental expert systems,
fuzzy rule-bases have been identified in the form of “IF-THEN" rules
through brainstorming with the stock market experts and the input
data have been derived from financial statements; as a result two
fuzzy rule-bases have been generated for all the stocks, one for return
and the other for risk.
In the 2nd phase, user preferences represented by four criteria and
are obtained by questionnaire. Using an expert system, four estimated
values of return and risk have been aggregated with the respective
values of user preference. At last, a fuzzy rule base having four rules,
treats these values and produce a ranking score for each stock which
will lead to a satisfactory portfolio for the user.
The stocks of six manufacturing companies and the period of
2003-2006 selected for data gathering.
Abstract: This paper describes a novel approach for deriving
modules from protein-protein interaction networks, which combines
functional information with topological properties of the network.
This approach is based on weighted clustering coefficient, which
uses weights representing the functional similarities between the
proteins. These weights are calculated according to the semantic
similarity between the proteins, which is based on their Gene
Ontology terms. We recently proposed an algorithm for identification
of functional modules, called SWEMODE (Semantic WEights for
MODule Elucidation), that identifies dense sub-graphs containing
functionally similar proteins. The rational underlying this approach is
that each module can be reduced to a set of triangles (protein triplets
connected to each other). Here, we propose considering semantic
similarity weights of all triangle-forming edges between proteins. We
also apply varying semantic similarity thresholds between
neighbours of each node that are not neighbours to each other (and
hereby do not form a triangle), to derive new potential triangles to
include in module-defining procedure. The results show an
improvement of pure topological approach, in terms of number of
predicted modules that match known complexes.
Abstract: In this paper, we present a novel approach to accurately
detect text regions including shop name in signboard images with
complex background for mobile system applications. The proposed
method is based on the combination of text detection using edge
profile and region segmentation using fuzzy c-means method. In the
first step, we perform an elaborate canny edge operator to extract all
possible object edges. Then, edge profile analysis with vertical and
horizontal direction is performed on these edge pixels to detect
potential text region existing shop name in a signboard. The edge
profile and geometrical characteristics of each object contour are
carefully examined to construct candidate text regions and classify the
main text region from background. Finally, the fuzzy c-means
algorithm is performed to segment and detected binarize text region.
Experimental results show that our proposed method is robust in text
detection with respect to different character size and color and can
provide reliable text binarization result.
Abstract: A new dynamic clustering approach (DCPSO), based
on Particle Swarm Optimization, is proposed. This approach is
applied to unsupervised image classification. The proposed approach
automatically determines the "optimum" number of clusters and
simultaneously clusters the data set with minimal user interference.
The algorithm starts by partitioning the data set into a relatively large
number of clusters to reduce the effects of initial conditions. Using
binary particle swarm optimization the "best" number of clusters is
selected. The centers of the chosen clusters is then refined via the Kmeans
clustering algorithm. The experiments conducted show that
the proposed approach generally found the "optimum" number of
clusters on the tested images.
Abstract: This paper represents four unsupervised clustering algorithms namely sIB, RandomFlatClustering, FarthestFirst, and FilteredClusterer that previously works have not been used for network traffic classification. The methodology, the result, the products of the cluster and evaluation of these algorithms with efficiency of each algorithm from accuracy are shown. Otherwise, the efficiency of these algorithms considering form the time that it use to generate the cluster quickly and correctly. Our work study and test the best algorithm by using classify traffic anomaly in network traffic with different attribute that have not been used before. We analyses the algorithm that have the best efficiency or the best learning and compare it to the previously used (K-Means). Our research will be use to develop anomaly detection system to more efficiency and more require in the future.
Abstract: Color image segmentation plays an important role in
computer vision and image processing areas. In this paper, the
features of Volterra filter are utilized for color image segmentation.
The discrete Volterra filter exhibits both linear and nonlinear
characteristics. The linear part smoothes the image features in
uniform gray zones and is used for getting a gross representation of
objects of interest. The nonlinear term compensates for the blurring
due to the linear term and preserves the edges which are mainly used
to distinguish the various objects. The truncated quadratic Volterra
filters are mainly used for edge preserving along with Gaussian noise
cancellation. In our approach, the segmentation is based on K-means
clustering algorithm in HSI space. Both the hue and the intensity
components are fully utilized. For hue clustering, the special cyclic
property of the hue component is taken into consideration. The
experimental results show that the proposed technique segments the
color image while preserving significant features and removing noise
effects.
Abstract: In this paper, a comparative study of application of
supervised and unsupervised learning algorithms on illumination
invariant face recognition has been carried out. The supervised
learning has been carried out with the help of using a bi-layered
artificial neural network having one input, two hidden and one output
layer. The gradient descent with momentum and adaptive learning
rate back propagation learning algorithm has been used to implement
the supervised learning in a way that both the inputs and
corresponding outputs are provided at the time of training the
network, thus here is an inherent clustering and optimized learning of
weights which provide us with efficient results.. The unsupervised
learning has been implemented with the help of a modified
Counterpropagation network. The Counterpropagation network
involves the process of clustering followed by application of Outstar
rule to obtain the recognized face. The face recognition system has
been developed for recognizing faces which have varying
illumination intensities, where the database images vary in lighting
with respect to angle of illumination with horizontal and vertical
planes. The supervised and unsupervised learning algorithms have
been implemented and have been tested exhaustively, with and
without application of histogram equalization to get efficient results.
Abstract: Image clustering is a process of grouping images
based on their similarity. The image clustering usually uses the color
component, texture, edge, shape, or mixture of two components, etc.
This research aims to explore image clustering using color
composition. In order to complete this image clustering, three main
components should be considered, which are color space, image
representation (feature extraction), and clustering method itself. We
aim to explore which composition of these factors will produce the
best clustering results by combining various techniques from the
three components. The color spaces use RGB, HSV, and L*a*b*
method. The image representations use Histogram and Gaussian
Mixture Model (GMM), whereas the clustering methods use KMeans
and Agglomerative Hierarchical Clustering algorithm. The
results of the experiment show that GMM representation is better
combined with RGB and L*a*b* color space, whereas Histogram is
better combined with HSV. The experiments also show that K-Means
is better than Agglomerative Hierarchical for images clustering.
Abstract: Wireless Sensor Networks can be used to monitor the
physical phenomenon in such areas where human approach is nearly
impossible. Hence the limited power supply is the major constraint of
the WSNs due to the use of non-rechargeable batteries in sensor
nodes. A lot of researches are going on to reduce the energy
consumption of sensor nodes. Energy map can be used with
clustering, data dissemination and routing techniques to reduce the
power consumption of WSNs. Energy map can also be used to know
which part of the network is going to fail in near future. In this paper,
Energy map is constructed using the prediction based approach.
Adaptive alpha GM(1,1) model is used as the prediction model.
GM(1,1) is being used worldwide in many applications for predicting
future values of time series using some past values due to its high
computational efficiency and accuracy.
Abstract: We present a non standard Euclidean vehicle
routing problem adding a level of clustering, and we revisit the use
of self-organizing maps as a tool which naturally handles such
problems. We present how they can be used as a main operator
into an evolutionary algorithm to address two conflicting
objectives of route length and distance from customers to bus stops
minimization and to deal with capacity constraints. We apply the
approach to a real-life case of combined clustering and vehicle
routing for the transportation of the 780 employees of an
enterprise. Basing upon a geographic information system we
discuss the influence of road infrastructures on the solutions
generated.
Abstract: This paper presents a supervised clustering algorithm,
namely Grid-Based Supervised Clustering (GBSC), which is able to
identify clusters of any shapes and sizes without presuming any
canonical form for data distribution. The GBSC needs no prespecified
number of clusters, is insensitive to the order of the input
data objects, and is capable of handling outliers. Built on the
combination of grid-based clustering and density-based clustering,
under the assistance of the downward closure property of density
used in bottom-up subspace clustering, the GBSC can notably reduce
its search space to avoid the memory confinement situation during its
execution. On two-dimension synthetic datasets, the GBSC can
identify clusters with different shapes and sizes correctly. The GBSC
also outperforms other five supervised clustering algorithms when
the experiments are performed on some UCI datasets.
Abstract: Methods of clustering which were developed in the
data mining theory can be successfully applied to the investigation of
different kinds of dependencies between the conditions of
environment and human activities. It is known, that environmental
parameters such as temperature, relative humidity, atmospheric
pressure and illumination have significant effects on the human
mental performance. To investigate these parameters effect, data
mining technique of clustering using entropy and Information Gain
Ratio (IGR) K(Y/X) = (H(X)–H(Y/X))/H(Y) is used, where
H(Y)=-ΣPi ln(Pi). This technique allows adjusting the boundaries of
clusters. It is shown that the information gain ratio (IGR) grows
monotonically and simultaneously with degree of connectivity
between two variables. This approach has some preferences if
compared, for example, with correlation analysis due to relatively
smaller sensitivity to shape of functional dependencies. Variant of an
algorithm to implement the proposed method with some analysis of
above problem of environmental effects is also presented. It was
shown that proposed method converges with finite number of steps.