Abstract: Biologically human brain processes information in both unimodal and multimodal approaches. In fact, information is progressively abstracted and seamlessly fused. Subsequently, the fusion of multimodal inputs allows a holistic understanding of a problem. The proliferation of technology has exponentially produced various sources of data, which could be likened to being the state of multimodality in human brain. Therefore, this is an inspiration to develop a methodology for exploring multimodal data and further identifying multi-view patterns. Specifically, we propose a brain inspired conceptual model that allows exploration and identification of patterns at different levels of granularity, different types of hierarchies and different types of modalities. A structurally adaptive neural network is deployed to implement the proposed model. Furthermore, the acquisition of multi-view patterns with the proposed model is demonstrated and discussed with some experimental results.
Abstract: With the explosive growth of data available on the
Internet, personalization of this information space become a
necessity. At present time with the rapid increasing popularity of the
WWW, Websites are playing a crucial role to convey knowledge and
information to the end users. Discovering hidden and meaningful
information about Web users usage patterns is critical to determine
effective marketing strategies to optimize the Web server usage for
accommodating future growth. The task of mining useful information
becomes more challenging when the Web traffic volume is enormous
and keeps on growing. In this paper, we propose a intelligent model
to discover and analyze useful knowledge from the available Web
log data.
Abstract: Intelligent systems are required in order to quickly and accurately analyze enormous quantities of data in the Internet environment. In intelligent systems, information extracting processes can be divided into supervised learning and unsupervised learning. This paper investigates intelligent clustering by unsupervised learning. Intelligent clustering is the clustering system which determines the clustering model for data analysis and evaluates results by itself. This system can make a clustering model more rapidly, objectively and accurately than an analyzer. The methodology for the automatic clustering intelligent system is a multi-agent system that comprises a clustering agent and a cluster performance evaluation agent. An agent exchanges information about clusters with another agent and the system determines the optimal cluster number through this information. Experiments using data sets in the UCI Machine Repository are performed in order to prove the validity of the system.
Abstract: In Data mining, Fuzzy clustering algorithms have
demonstrated advantage over crisp clustering algorithms in dealing
with the challenges posed by large collections of vague and uncertain
natural data. This paper reviews concept of fuzzy logic and fuzzy
clustering. The classical fuzzy c-means algorithm is presented and its
limitations are highlighted. Based on the study of the fuzzy c-means
algorithm and its extensions, we propose a modification to the cmeans
algorithm to overcome the limitations of it in calculating the
new cluster centers and in finding the membership values with
natural data. The efficiency of the new modified method is
demonstrated on real data collected for Bhutan-s Gross National
Happiness (GNH) program.
Abstract: Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.
Abstract: Young patients suffering from Cerebral Palsy are
facing difficult choices concerning heavy surgeries. Diagnosis settled
by surgeons can be complex and on the other hand decision for
patient about getting or not such a surgery involves important
reflection effort. Proposed software combining prediction for
surgeries and post surgery kinematic values, and from 3D model
representing the patient is an innovative tool helpful for both patients
and medicine professionals. Beginning with analysis and
classification of kinematics values from Data Base extracted from
gait analysis in 3 separated clusters, it is possible to determine close
similarity between patients. Prediction surgery best adapted to
improve a patient gait is then determined by operating a suitable
preconditioned neural network. Finally, patient 3D modeling based
on kinematic values analysis, is animated thanks to post surgery
kinematic vectors characterizing the closest patient selected from
patients clustering.
Abstract: This paper presents a software quality support tool, a
Java source code evaluator and a code profiler based on
computational intelligence techniques. It is Java prototype software
developed by AI Group [1] from the Research Laboratories at
Universidad de Palermo: an Intelligent Java Analyzer (in Spanish:
Analizador Java Inteligente, AJI). It represents a new approach to
evaluate and identify inaccurate source code usage and transitively,
the software product itself.
The aim of this project is to provide the software development
industry with a new tool to increase software quality by extending
the value of source code metrics through computational intelligence.
Abstract: Duplicated region detection is a technical method to
expose copy-paste forgeries on digital images. Copy-paste is one
of the common types of forgeries to clone portion of an image
in order to conceal or duplicate special object. In this type of
forgery detection, extracting robust block feature and also high
time complexity of matching step are two main open problems.
This paper concentrates on computational time and proposes a local
block matching algorithm based on block clustering to enhance time
complexity. Time complexity of the proposed algorithm is formulated
and effects of two parameter, block size and number of cluster, on
efficiency of this algorithm are considered. The experimental results
and mathematical analysis demonstrate this algorithm is more costeffective
than lexicographically algorithms in time complexity issue
when the image is complex.
Abstract: Automatic reusability appraisal could be helpful in
evaluating the quality of developed or developing reusable software
components and in identification of reusable components from
existing legacy systems; that can save cost of developing the software
from scratch. But the issue of how to identify reusable components
from existing systems has remained relatively unexplored. In this
paper, we have mentioned two-tier approach by studying the
structural attributes as well as usability or relevancy of the
component to a particular domain. Latent semantic analysis is used
for the feature vector representation of various software domains. It
exploits the fact that FeatureVector codes can be seen as documents
containing terms -the idenifiers present in the components- and so
text modeling methods that capture co-occurrence information in
low-dimensional spaces can be used. Further, we devised Neuro-
Fuzzy hybrid Inference System, which takes structural metric values
as input and calculates the reusability of the software component.
Decision tree algorithm is used to decide initial set of fuzzy rules for
the Neuro-fuzzy system. The results obtained are convincing enough
to propose the system for economical identification and retrieval of
reusable software components.
Abstract: Number of documents being created increases at an
increasing pace while most of them being in already known topics
and little of them introducing new concepts. This fact has started a
new era in information retrieval discipline where the requirements
have their own specialties. That is digging into topics and concepts
and finding out subtopics or relations between topics. Up to now IR
researches were interested in retrieving documents about a general
topic or clustering documents under generic subjects. However these
conventional approaches can-t go deep into content of documents
which makes it difficult for people to reach to right documents they
were searching. So we need new ways of mining document sets
where the critic point is to know much about the contents of the
documents. As a solution we are proposing to enhance LSI, one of
the proven IR techniques by supporting its vector space with n-gram
forms of words. Positive results we have obtained are shown in two
different application area of IR domain; querying a document
database, clustering documents in the document database.
Abstract: Clustering techniques have received attention in many areas including engineering, medicine, biology and data mining. The purpose of clustering is to group together data points, which are close to one another. The K-means algorithm is one of the most widely used techniques for clustering. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. This paper is presented an efficient hybrid evolutionary optimization algorithm based on combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO), called PSO-ACO, for optimally clustering N object into K clusters. The new PSO-ACO algorithm is tested on several data sets, and its performance is compared with those of ACO, PSO and K-means clustering. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handing data clustering.
Abstract: Lacking an inherent “natural" dissimilarity measure
between objects in categorical dataset presents special difficulties in
clustering analysis. However, each categorical attributes from a given
dataset provides natural probability and information in the sense of
Shannon. In this paper, we proposed a novel method which
heuristically converts categorical attributes to numerical values by
exploiting such associated information. We conduct an experimental
study with real-life categorical dataset. The experiment demonstrates
the effectiveness of our approach.
Abstract: Wireless Sensor Networks consist of small battery
powered devices with limited energy resources. once deployed, the
small sensor nodes are usually inaccessible to the user, and thus
replacement of the energy source is not feasible. Hence, One of the
most important issues that needs to be enhanced in order to improve
the life span of the network is energy efficiency. to overcome this
demerit many research have been done. The clustering is the one of
the representative approaches. in the clustering, the cluster heads
gather data from nodes and sending them to the base station. In this
paper, we introduce a dynamic clustering algorithm using genetic
algorithm. This algorithm takes different parameters into
consideration to increase the network lifetime. To prove efficiency of
proposed algorithm, we simulated the proposed algorithm compared
with LEACH algorithm using the matlab
Abstract: A model to identify the lifetime of target tracking
wireless sensor network is proposed. The model is a static clusterbased
architecture and aims to provide two factors. First, it is to
increase the lifetime of target tracking wireless sensor network.
Secondly, it is to enable good localization result with low energy
consumption for each sensor in the network. The model consists of
heterogeneous sensors and each sensing member node in a cluster
uses two operation modes–active mode and sleep mode. The
performance results illustrate that the proposed architecture consumes
less energy and increases lifetime than centralized and dynamic
clustering architectures, for target tracking sensor network.
Abstract: A new Markovianity approach is introduced in this
paper. This approach reduces the response time of classic Markov
Random Fields approach. First, one region is determinated by a
clustering technique. Then, this region is excluded from the study.
The remaining pixel form the study zone and they are selected for a
Markovianity segmentation task. With Selective Markovianity
approach, segmentation process is faster than classic one.
Abstract: Methods for organizing web data into groups in order
to analyze web-based hypertext data and facilitate data availability
are very important in terms of the number of documents available
online. Thereby, the task of clustering web-based document structures
has many applications, e.g., improving information retrieval on the
web, better understanding of user navigation behavior, improving web
users requests servicing, and increasing web information accessibility.
In this paper we investigate a new approach for clustering web-based
hypertexts on the basis of their graph structures. The hypertexts will
be represented as so called generalized trees which are more general
than usual directed rooted trees, e.g., DOM-Trees. As a important
preprocessing step we measure the structural similarity between the
generalized trees on the basis of a similarity measure d. Then,
we apply agglomerative clustering to the obtained similarity matrix
in order to create clusters of hypertext graph patterns representing
navigation structures. In the present paper we will run our approach
on a data set of hypertext structures and obtain good results in
Web Structure Mining. Furthermore we outline the application of
our approach in Web Usage Mining as future work.
Abstract: Cluster analysis divides data into groups that are
meaningful, useful, or both. Analysis of biological data is creating a
new generation of epidemiologic, prognostic, diagnostic and
treatment modalities. Clustering of protein sequences is one of the
current research topics in the field of computer science. Linear
relation is valuable in rule discovery for a given data, such as if value
X goes up 1, value Y will go down 3", etc. The classical linear
regression models the linear relation of two sequences perfectly.
However, if we need to cluster a large repository of protein sequences
into groups where sequences have strong linear relationship with
each other, it is prohibitively expensive to compare sequences one by
one. In this paper, we propose a new technique named General
Regression Model Technique Clustering Algorithm (GRMTCA) to
benignly handle the problem of linear sequences clustering. GRMT
gives a measure, GR*, to tell the degree of linearity of multiple
sequences without having to compare each pair of them.
Abstract: Fault-proneness of a software module is the
probability that the module contains faults. To predict faultproneness
of modules different techniques have been proposed which
includes statistical methods, machine learning techniques, neural
network techniques and clustering techniques. The aim of proposed
study is to explore whether metrics available in the early lifecycle
(i.e. requirement metrics), metrics available in the late lifecycle (i.e.
code metrics) and metrics available in the early lifecycle (i.e.
requirement metrics) combined with metrics available in the late
lifecycle (i.e. code metrics) can be used to identify fault prone
modules using Genetic Algorithm technique. This approach has been
tested with real time defect C Programming language datasets of
NASA software projects. The results show that the fusion of
requirement and code metric is the best prediction model for
detecting the faults as compared with commonly used code based
model.
Abstract: The goal of this paper is to segment the countries
based on the value of export from Iran during 14 years ending at 2005. To measure the dissimilarity among export baskets of different countries, we define Dissimilarity Export Basket (DEB) function and
use this distance function in K-means algorithm. The DEB function
is defined based on the concepts of the association rules and the
value of export group-commodities. In this paper, clustering quality
function and clusters intraclass inertia are defined to, respectively,
calculate the optimum number of clusters and to compare the
functionality of DEB versus Euclidean distance. We have also study
the effects of importance weight in DEB function to improve
clustering quality. Lastly when segmentation is completed, a
designated RFM model is used to analyze the relative profitability of
each cluster.
Abstract: In literature, there are metrics for identifying the
quality of reusable components but the framework that makes use of
these metrics to precisely predict reusability of software components
is still need to be worked out. These reusability metrics if identified
in the design phase or even in the coding phase can help us to reduce
the rework by improving quality of reuse of the software component
and hence improve the productivity due to probabilistic increase in
the reuse level. As CK metric suit is most widely used metrics for
extraction of structural features of an object oriented (OO) software;
So, in this study, tuned CK metric suit i.e. WMC, DIT, NOC, CBO
and LCOM, is used to obtain the structural analysis of OO-based
software components. An algorithm has been proposed in which the
inputs can be given to K-Means Clustering system in form of
tuned values of the OO software component and decision tree is
formed for the 10-fold cross validation of data to evaluate the in
terms of linguistic reusability value of the component. The developed
reusability model has produced high precision results as desired.