Abstract: In this paper, Fuzzy C-Means clustering with
Expectation Maximization-Gaussian Mixture Model based hybrid
modeling algorithm is proposed for Continuous Tamil Speech
Recognition. The speech sentences from various speakers are used
for training and testing phase and objective measures are between the
proposed and existing Continuous Speech Recognition algorithms.
From the simulated results, it is observed that the proposed algorithm
improves the recognition accuracy and F-measure up to 3% as
compared to that of the existing algorithms for the speech signal from
various speakers. In addition, it reduces the Word Error Rate, Error
Rate and Error up to 4% as compared to that of the existing
algorithms. In all aspects, the proposed hybrid modeling for Tamil
speech recognition provides the significant improvements for speechto-
text conversion in various applications.
Abstract: The world wide web network is a network with a
complex topology, the main properties of which are the distribution
of degrees in power law, A low clustering coefficient and a weak
average distance. Modeling the web as a graph allows locating the
information in little time and consequently offering a help in the
construction of the research engine. Here, we present a model based
on the already existing probabilistic graphs with all the aforesaid
characteristics. This work will consist in studying the web in order to
know its structuring thus it will enable us to modelize it more easily
and propose a possible algorithm for its exploration.
Abstract: The North-eastern part of India, which receives
heavier rainfall than other parts of the subcontinent, is of great
concern now-a-days with regard to climate change. High intensity
rainfall for short duration and longer dry spell, occurring due to
impact of climate change, affects river morphology too. In the present
study, an attempt is made to delineate the North-eastern region of
India into some homogeneous clusters based on the Fuzzy Clustering
concept and to compare the resulting clusters obtained by using
conventional methods and nonconventional methods of clustering.
The concept of clustering is adapted in view of the fact that, impact
of climate change can be studied in a homogeneous region without
much variation, which can be helpful in studies related to water
resources planning and management. 10 IMD (Indian Meteorological
Department) stations, situated in various regions of the North-east,
have been selected for making the clusters. The results of the Fuzzy
C-Means (FCM) analysis show different clustering patterns for
different conditions. From the analysis and comparison it can be
concluded that nonconventional method of using GCM data is
somehow giving better results than the others. However, further
analysis can be done by taking daily data instead of monthly means to
reduce the effect of standardization.
Abstract: Brain functional networks based on resting-state EEG
data were compared between patients with mild Alzheimer’s disease
(mAD) and matched patients with amnestic subtype of mild cognitive
impairment (aMCI). We integrated the time–frequency cross mutual
information (TFCMI) method to estimate the EEG functional
connectivity between cortical regions and the network analysis based
on graph theory to further investigate the alterations of functional
networks in mAD compared with aMCI group. We aimed at
investigating the changes of network integrity, local clustering,
information processing efficiency, and fault tolerance in mAD brain
networks for different frequency bands based on several topological
properties, including degree, strength, clustering coefficient, shortest
path length, and efficiency. Results showed that the disruptions of
network integrity and reductions of network efficiency in mAD
characterized by lower degree, decreased clustering coefficient, higher
shortest path length, and reduced global and local efficiencies in the
delta, theta, beta2, and gamma bands were evident. The significant
changes in network organization can be used in assisting
discrimination of mAD from aMCI in clinical.
Abstract: Recent advances in wireless networking technologies
introduce several energy aware routing protocols in sensor networks.
Such protocols aim to extend the lifetime of network by reducing the
energy consumption of nodes. Many researchers are looking for
certain challenges that are predominant in the grounds of energy
consumption. One such protocol that addresses this energy
consumption issue is ‘Cluster based hierarchical routing protocol’. In
this paper, we intend to discuss some of the major hierarchical
routing protocols adhering towards sensor networks. Furthermore, we
examine and compare several aspects and characteristics of few
widely explored hierarchical clustering protocols, and its operations
in wireless sensor networks (WSN). This paper also presents a
discussion on the future research topics and the challenges of
hierarchical clustering in WSNs.
Abstract: In this paper, GSM signal strength was measured in
order to detect the type of the signal fading phenomenon using onedimensional
multilevel wavelet residual method and neural network
clustering to determine the average GSM signal strength received in
the study area. The wavelet residual method predicted that the GSM
signal experienced slow fading and attenuated with MSE of 3.875dB.
The neural network clustering revealed that mostly -75dB, -85dB and
-95dB were received. This means that the signal strength received in
the study is a weak signal.
Abstract: The need to extract R&D keywords from issues and use
them to retrieve R&D information is increasing rapidly. However, it is
difficult to identify related issues or distinguish them. Although the
similarity between issues cannot be identified, with an R&D lexicon,
issues that always share the same R&D keywords can be determined.
In detail, the R&D keywords that are associated with a particular issue
imply the key technology elements that are needed to solve a particular
issue.
Furthermore, the relationship among issues that share the same
R&D keywords can be shown in a more systematic way by clustering
them according to keywords. Thus, sharing R&D results and reusing
R&D technology can be facilitated. Indirectly, redundant investment
in R&D can be reduced as the relevant R&D information can be shared
among corresponding issues and the reusability of related R&D can be
improved. Therefore, a methodology to cluster issues from the
perspective of common R&D keywords is proposed to satisfy these
demands.
Abstract: An extensive amount of work has been done in data
clustering research under the unsupervised learning technique in Data
Mining during the past two decades. Moreover, several approaches
and methods have been emerged focusing on clustering diverse data
types, features of cluster models and similarity rates of clusters.
However, none of the single clustering algorithm exemplifies its best
nature in extracting efficient clusters. Consequently, in order to
rectify this issue, a new challenging technique called Cluster
Ensemble method was bloomed. This new approach tends to be the
alternative method for the cluster analysis problem. The main
objective of the Cluster Ensemble is to aggregate the diverse
clustering solutions in such a way to attain accuracy and also to
improve the eminence the individual clustering algorithms. Due to
the massive and rapid development of new methods in the globe of
data mining, it is highly mandatory to scrutinize a vital analysis of
existing techniques and the future novelty. This paper shows the
comparative analysis of different cluster ensemble methods along
with their methodologies and salient features. Henceforth this
unambiguous analysis will be very useful for the society of clustering
experts and also helps in deciding the most appropriate one to resolve
the problem in hand.
Abstract: Search is the most obvious application of information
retrieval. The variety of widely obtainable biomedical data is
enormous and is expanding fast. This expansion makes the existing
techniques are not enough to extract the most interesting patterns
from the collection as per the user requirement. Recent researches are
concentrating more on semantic based searching than the traditional
term based searches. Algorithms for semantic searches are
implemented based on the relations exist between the words of the
documents. Ontologies are used as domain knowledge for identifying
the semantic relations as well as to structure the data for effective
information retrieval. Annotation of data with concepts of ontology is
one of the wide-ranging practices for clustering the documents. In
this paper, indexing based on concept and annotation are proposed
for clustering the biomedical documents. Fuzzy c-means (FCM)
clustering algorithm is used to cluster the documents. The
performances of the proposed methods are analyzed with traditional
term based clustering for PubMed articles in five different diseases
communities. The experimental results show that the proposed
methods outperform the term based fuzzy clustering.
Abstract: Human face has a fundamental role in the appearance
of individuals. So the importance of facial surgeries is undeniable.
Thus, there is a need for the appropriate and accurate facial skin
segmentation in order to extract different features. Since Fuzzy CMeans
(FCM) clustering algorithm doesn’t work appropriately for
noisy images and outliers, in this paper we exploit Possibilistic CMeans
(PCM) algorithm in order to segment the facial skin. For this
purpose, first, we convert facial images from RGB to YCbCr color
space. To evaluate performance of the proposed algorithm, the
database of Sahand University of Technology, Tabriz, Iran was used.
In order to have a better understanding from the proposed algorithm;
FCM and Expectation-Maximization (EM) algorithms are also used
for facial skin segmentation. The proposed method shows better
results than the other segmentation methods. Results include
misclassification error (0.032) and the region’s area error (0.045) for
the proposed algorithm.
Abstract: The aim of this paper is to assess the influence of several indicators determining innovativeness of countries' economies by applying selected soft computing methods. Such methods enable us to identify correlations between indicators for period 2006-2010. The main attention in the paper is focused on selecting proper computer tools for solving this problem. As a tool supporting identification, the X-means clustering algorithm, the Apriori rules generation algorithm as well as Self-Organizing Feature Maps (SOMs) have been selected. The paper has rather a rudimentary character. We briefly describe usefulness of the selected approaches and indicate some challenges for further research.
Abstract: A dual tiered network model is designed to overcome the problem of energy alert and fault tolerance. This model minimizes the delay time and overcome failure of links. Performance analysis of the dual tiered network model is studied in this paper where the CA and LS schemes are compared with DEO optimal. We then evaluate the Integrated Network Topological Control and Key Management (INTK) Schemes, which was proposed to add security features of the wireless sensor networks. Clustering efficiency, level of protections, the time complexity is some of the parameters of INTK scheme that were analyzed. We then evaluate the Cluster based Energy Competent n-coverage scheme (CEC n-coverage scheme) to ensure area coverage for wireless sensor networks.
Abstract: Textual data plays an important role in the modern
world. The possibilities of applying data mining techniques to
uncover hidden information present in large volumes of text
collections is immense. The Growing Self Organizing Map (GSOM)
is a highly successful member of the Self Organising Map family
and has been used as a clustering and visualisation tool across wide
range of disciplines to discover hidden patterns present in the data.
A comprehensive analysis of the GSOM’s capabilities as a text
clustering and visualisation tool has so far not been published. These
functionalities, namely map visualisation capabilities, automatic
cluster identification and hierarchical clustering capabilities are
presented in this paper and are further demonstrated with experiments
on a benchmark text corpus.
Abstract: Recently, distributed generation technologies have received much attention for the potential energy savings and reliability assurances that might be achieved as a result of their widespread adoption. The distribution feeder reconfiguration (DFR) is one of the most important control schemes in the distribution networks, which can be affected by DGs. This paper presents a new approach to DFR at the distribution networks considering wind turbines. The main objective of the DFR is to minimize the deviation of the bus voltage. Since the DFR is a nonlinear optimization problem, we apply the Adaptive Modified Firefly Optimization (AMFO) approach to solve it. As a result of the conflicting behavior of the single- objective function, a fuzzy based clustering technique is employed to reach the set of optimal solutions called Pareto solutions. The approach is tested on the IEEE 32-bus standard test system.
Abstract: In the study presented institutional context is discussed in terms of companies’ entry mode choice. In contrary to many previous analyses, instead of using one or two aggregated variables, a set of eleven determinants is used to establish equity and non-equity internationalization friendly conditions. Based on secondary data, 140 countries are analyzed and grouped into clusters revealing similar framework. The range of the economies explored is wide as it covers all regions distinguished by The World Bank. The results can prove a useful alternative for operationalization of institutional variables in further research concerning entry modes or strategic management in international markets.
Abstract: Microarray gene expression data play a vital in biological processes, gene regulation and disease mechanism. Biclustering in gene expression data is a subset of the genes indicating consistent patterns under the subset of the conditions. Finding a biclustering is an optimization problem. In recent years, swarm intelligence techniques are popular due to the fact that many real-world problems are increasingly large, complex and dynamic. By reasons of the size and complexity of the problems, it is necessary to find an optimization technique whose efficiency is measured by finding the near optimal solution within a reasonable amount of time. In this paper, the algorithmic concepts of the Particle Swarm Optimization (PSO), Shuffled Frog Leaping (SFL) and Cuckoo Search (CS) algorithms have been analyzed for the four benchmark gene expression dataset. The experiment results show that CS outperforms PSO and SFL for 3 datasets and SFL give better performance in one dataset. Also this work determines the biological relevance of the biclusters with Gene Ontology in terms of function, process and component.
Abstract: The present work has been carried out to evaluate the diversity of a collection of 78 quinoa accessions developed through recurrent selection from Andean germplasm introduced to Morocco in the winter of 2000. Twenty-three quantitative and qualitative characters were used for the evaluation of genetic diversity and the relationship between the accessions, and also for the establishment of a core collection in Morocco. Important variation was found among the accessions in terms of plant morphology and growth behavior. Data analysis showed positive correlation of the plant height, the plant fresh and the dry weight with the grain yield, while days to flowering was found to be negatively correlated with grain yield. The first four PCs contributed 74.76% of the variability; the first PC showed significant variation with 42.86% of the total variation, PC2 with 15.37%, PC3 with 9.05% and PC4 contributed 7.49% of the total variation. Plant size, days to grain filling and days to maturity are correlated to the PC1; and seed size, inflorescence density and mildew resistance are correlated to the PC2. Hierarchical cluster analysis rearranged the 78 quinoa accessions into four main groups and ten sub-clusters. Clustering was found in associations with days to maturity and also with plant size and seed-size traits.
Abstract: Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.
Abstract: A face recognition system is a computer application for automatically identifying or verifying a person from a digital image or a video frame. A lot of algorithms have been proposed for face recognition. Vector Quantization (VQ) based face recognition is a novel approach for face recognition. Here a new codebook generation for VQ based face recognition using Integrated Adaptive Fuzzy Clustering (IAFC) is proposed. IAFC is a fuzzy neural network which incorporates a fuzzy learning rule into a competitive neural network. The performance of proposed algorithm is demonstrated by using publicly available AT&T database, Yale database, Indian Face database and a small face database, DCSKU database created in our lab. In all the databases the proposed approach got a higher recognition rate than most of the existing methods. In terms of Equal Error Rate (ERR) also the proposed codebook is better than the existing methods.
Abstract: In pattern clustering, nearest neighborhood point computation is a challenging issue for many applications in the area of research such as Remote Sensing, Computer Vision, Pattern Recognition and Statistical Imaging. Nearest neighborhood
computation is an essential computation for providing sufficient classification among the volume of pixels (voxels) in order to localize the active-region-of-interests (AROI). Furthermore, it is needed to compute spatial metric relationships of diverse area of imaging based on the applications of pattern recognition. In this paper, we propose a new methodology for finding the nearest neighbor point, depending on making a virtually grid of a hexagon cells, then locate every point beneath them. An algorithm is suggested for minimizing the computation and increasing the turnaround time of the process. The nearest neighbor query points Φ are fetched by seeking fashion of hexagon holistic. Seeking will be repeated until an AROI Φ is to be expected. If any point Υ is located then searching starts in the nearest hexagons in a circular way. The First hexagon is considered be level 0 (L0) and the surrounded hexagons is level 1 (L1). If Υ is located in L1, then search starts in the next level (L2) to ensure that Υ is the nearest neighbor for Φ. Based on the result and experimental results, we found that the proposed method has an advantage over the traditional methods in terms of minimizing the time complexity required for searching the neighbors, in turn, efficiency of classification will be improved sufficiently.