Abstract: This paper reports work done to improve the modeling of complex processes when only small experimental data sets are available. Neural networks are used to capture the nonlinear underlying phenomena contained in the data set and to partly eliminate the burden of having to specify completely the structure of the model. Two different types of neural networks were used for the application of Pulping of Sugar Maple problem. A three layer feed forward neural networks, using the Preconditioned Conjugate Gradient (PCG) methods were used in this investigation. Preconditioning is a method to improve convergence by lowering the condition number and increasing the eigenvalues clustering. The idea is to solve the modified problem where M is a positive-definite preconditioner that is closely related to A. We mainly focused on Preconditioned Conjugate Gradient- based training methods which originated from optimization theory, namely Preconditioned Conjugate Gradient with Fletcher-Reeves Update (PCGF), Preconditioned Conjugate Gradient with Polak-Ribiere Update (PCGP) and Preconditioned Conjugate Gradient with Powell-Beale Restarts (PCGB). The behavior of the PCG methods in the simulations proved to be robust against phenomenon such as oscillations due to large step size.
Abstract: In this paper, subtractive clustering based fuzzy inference system approach is used for early detection of faults in the function oriented software systems. This approach has been tested with real time defect datasets of NASA software projects named as PC1 and CM1. Both the code based model and joined model (combination of the requirement and code based metrics) of the datasets are used for training and testing of the proposed approach. The performance of the models is recorded in terms of Accuracy, MAE and RMSE values. The performance of the proposed approach is better in case of Joined Model. As evidenced from the results obtained it can be concluded that Clustering and fuzzy logic together provide a simple yet powerful means to model the earlier detection of faults in the function oriented software systems.
Abstract: The network traffic data provided for the design of
intrusion detection always are large with ineffective information and
enclose limited and ambiguous information about users- activities.
We study the problems and propose a two phases approach in our
intrusion detection design. In the first phase, we develop a
correlation-based feature selection algorithm to remove the worthless
information from the original high dimensional database. Next, we
design an intrusion detection method to solve the problems of
uncertainty caused by limited and ambiguous information. In the
experiments, we choose six UCI databases and DARPA KDD99
intrusion detection data set as our evaluation tools. Empirical studies
indicate that our feature selection algorithm is capable of reducing the
size of data set. Our intrusion detection method achieves a better
performance than those of participating intrusion detectors.
Abstract: A genetic algorithm (GA) based feature subset
selection algorithm is proposed in which the correlation structure of
the features is exploited. The subset of features is validated according
to the classification performance. Features derived from the
continuous wavelet transform are potentially strongly correlated.
GA-s that do not take the correlation structure of features into
account are inefficient. The proposed algorithm forms clusters of
correlated features and searches for a good candidate set of clusters.
Secondly a search within the clusters is performed. Different
simulations of the algorithm on a real-case data set with strong
correlations between features show the increased classification
performance. Comparison is performed with a standard GA without
use of the correlation structure.
Abstract: In recent years, the research in wireless sensor
network has increased steadily, and many studies were focusing on
reducing energy consumption of sensor nodes to extend their lifetimes.
In this paper, the issue of energy consumption is investigated and two
adaptive mechanisms are proposed to extend the network lifetime.
This study uses high-energy-first scheme to determine cluster heads
for data transmission. Thus, energy consumption in each cluster is
balanced and network lifetime can be extended. In addition, this study
uses cluster merging and dynamic routing mechanisms to further
reduce energy consumption during data transmission. The simulation
results show that the proposed method can effectively extend the
lifetime of wireless sensor network, and it is suitable for different base
station locations.
Abstract: Brain ArterioVenous Malformation (BAVM) is an abnormal tangle of brain blood vessels where arteries shunt directly into veins with no intervening capillary bed which causes high pressure and hemorrhage risk. The success of treatment by embolization in interventional neuroradiology is highly dependent on the accuracy of the vessels visualization. In this paper the performance of clustering techniques on vessel segmentation from 3- D rotational angiography (3DRA) images is investigated and a new technique of segmentation is proposed. This method consists in: preprocessing step of image enhancement, then K-Means (KM), Fuzzy C-Means (FCM) and Expectation Maximization (EM) clustering are used to separate vessel pixels from background and artery pixels from vein pixels when possible. A post processing step of removing false-alarm components is applied before constructing a three-dimensional volume of the vessels. The proposed method was tested on six datasets along with a medical assessment of an expert. Obtained results showed encouraging segmentations.
Abstract: In this paper we study the fuzzy c-mean clustering algorithm
combined with principal components method. Demonstratively
analysis indicate that the new clustering method is well rather than
some clustering algorithms. We also consider the validity of clustering
method.
Abstract: A neurofuzzy approach for a given set of input-output training data is proposed in two phases. Firstly, the data set is partitioned automatically into a set of clusters. Then a fuzzy if-then rule is extracted from each cluster to form a fuzzy rule base. Secondly, a fuzzy neural network is constructed accordingly and parameters are tuned to increase the precision of the fuzzy rule base. This network is able to learn and optimize the rule base of a Sugeno like Fuzzy inference system using Hybrid learning algorithm, which combines gradient descent, and least mean square algorithm. This proposed neurofuzzy system has the advantage of determining the number of rules automatically and also reduce the number of rules, decrease computational time, learns faster and consumes less memory. The authors also investigate that how neurofuzzy techniques can be applied in the area of control theory to design a fuzzy controller for linear and nonlinear dynamic systems modelling from a set of input/output data. The simulation analysis on a wide range of processes, to identify nonlinear components on-linely in a control system and a benchmark problem involving the prediction of a chaotic time series is carried out. Furthermore, the well-known examples of linear and nonlinear systems are also simulated under the Matlab/Simulink environment. The above combination is also illustrated in modeling the relationship between automobile trips and demographic factors.
Abstract: It has been established that microRNAs (miRNAs) play
an important role in gene expression by post-transcriptional regulation
of messengerRNAs (mRNAs). However, the precise relationships
between microRNAs and their target genes in sense of numbers,
types and biological relevance remain largely unclear. Dissecting the
miRNA-target relationships will render more insights for miRNA
targets identification and validation therefore promote the understanding
of miRNA function. In miRBase, miRanda is the key
algorithm used for target prediction for Zebrafish. This algorithm
is high-throughput but brings lots of false positives (noise). Since
validation of a large scale of targets through laboratory experiments
is very time consuming, several computational methods for miRNA
targets validation should be developed. In this paper, we present an
integrative method to investigate several aspects of the relationships
between miRNAs and their targets with the final purpose of extracting
high confident targets from miRanda predicted targets pool. This is
achieved by using the techniques ranging from statistical tests to
clustering and association rules. Our research focuses on Zebrafish.
It was found that validated targets do not necessarily associate with
the highest sequence matching. Besides, for some miRNA families,
the frequency of their predicted targets is significantly higher in the
genomic region nearby their own physical location. Finally, in a case
study of dre-miR-10 and dre-miR-196, it was found that the predicted
target genes hoxd13a, hoxd11a, hoxd10a and hoxc4a of dre-miR-
10 while hoxa9a, hoxc8a and hoxa13a of dre-miR-196 have similar
characteristics as validated target genes and therefore represent high
confidence target candidates.
Abstract: In this paper, we present a system for content-based
retrieval of large database of classified satellite images, based on
user's relevance feedback (RF).Through our proposed system, we
divide each satellite image scene into small subimages, which stored
in the database. The modified radial basis functions neural network
has important role in clustering the subimages of database according
to the Euclidean distance between the query feature vector and the
other subimages feature vectors. The advantage of using RF
technique in such queries is demonstrated by analyzing the database
retrieval results.
Abstract: This paper uses the radial basis function neural
network (RBFNN) for system identification of nonlinear systems.
Five nonlinear systems are used to examine the activity of RBFNN in
system modeling of nonlinear systems; the five nonlinear systems are
dual tank system, single tank system, DC motor system, and two
academic models. The feed forward method is considered in this
work for modelling the non-linear dynamic models, where the KMeans
clustering algorithm used in this paper to select the centers of
radial basis function network, because it is reliable, offers fast
convergence and can handle large data sets. The least mean square
method is used to adjust the weights to the output layer, and
Euclidean distance method used to measure the width of the Gaussian
function.
Abstract: Geographic Profiling has successfully assisted investigations for serial crimes. Considering the multi-cluster feature of serial criminal spots, we propose a Multi-point Centrography model as a natural extension of Single-point Centrography for geographic profiling. K-means clustering is first performed on the data samples and then Single-point Centrography is adopted to derive a probability distribution on each cluster. Finally, a weighted combinations of each distribution is formed to make next-crime spot prediction. Experimental study on real cases demonstrates the effectiveness of our proposed model.
Abstract: In this paper an algorithm is used to detect the color defects of ceramic tiles. First the image of a normal tile is clustered using GCMA; Genetic C-means Clustering Algorithm; those results in best cluster centers. C-means is a common clustering algorithm which optimizes an objective function, based on a measure between data points and the cluster centers in the data space. Here the objective function describes the mean square error. After finding the best centers, each pixel of the image is assigned to the cluster with closest cluster center. Then, the maximum errors of clusters are computed. For each cluster, max error is the maximum distance between its center and all the pixels which belong to it. After computing errors all the pixels of defected tile image are clustered based on the centers obtained from normal tile image in previous stage. Pixels which their distance from their cluster center is more than the maximum error of that cluster are considered as defected pixels.
Abstract: Automatic segmentation of skin lesions is the first step
towards development of a computer-aided diagnosis of melanoma.
Although numerous segmentation methods have been developed,
few studies have focused on determining the most discriminative
and effective color space for melanoma application. This paper
proposes a novel automatic segmentation algorithm using color space
analysis and clustering-based histogram thresholding, which is able to
determine the optimal color channel for segmentation of skin lesions.
To demonstrate the validity of the algorithm, it is tested on a set of 30
high resolution dermoscopy images and a comprehensive evaluation
of the results is provided, where borders manually drawn by four
dermatologists, are compared to automated borders detected by the
proposed algorithm. The evaluation is carried out by applying three
previously used metrics of accuracy, sensitivity, and specificity and
a new metric of similarity. Through ROC analysis and ranking the
metrics, it is shown that the best results are obtained with the X and
XoYoR color channels which results in an accuracy of approximately
97%. The proposed method is also compared with two state-ofthe-
art skin lesion segmentation methods, which demonstrates the
effectiveness and superiority of the proposed segmentation method.
Abstract: As wireless sensor networks are energy constraint networks
so energy efficiency of sensor nodes is the main design issue.
Clustering of nodes is an energy efficient approach. It prolongs the
lifetime of wireless sensor networks by avoiding long distance communication.
Clustering algorithms operate in rounds. Performance of
clustering algorithm depends upon the round time. A large round
time consumes more energy of cluster heads while a small round
time causes frequent re-clustering. So existing clustering algorithms
apply a trade off to round time and calculate it from the initial
parameters of networks. But it is not appropriate to use initial
parameters based round time value throughout the network lifetime
because wireless sensor networks are dynamic in nature (nodes can be
added to the network or some nodes go out of energy). In this paper
a variable round time approach is proposed that calculates round
time depending upon the number of active nodes remaining in the
field. The proposed approach makes the clustering algorithm adaptive
to network dynamics. For simulation the approach is implemented
with LEACH in NS-2 and the results show that there is 6% increase
in network lifetime, 7% increase in 50% node death time and 5%
improvement over the data units gathered at the base station.
Abstract: In this paper, a clustering algorithm named KHarmonic
means (KHM) was employed in the training of Radial
Basis Function Networks (RBFNs). KHM organized the data in
clusters and determined the centres of the basis function. The popular
clustering algorithms, namely K-means (KM) and Fuzzy c-means
(FCM), are highly dependent on the initial identification of elements
that represent the cluster well. In KHM, the problem can be avoided.
This leads to improvement in the classification performance when
compared to other clustering algorithms. A comparison of the
classification accuracy was performed between KM, FCM and KHM.
The classification performance is based on the benchmark data sets:
Iris Plant, Diabetes and Breast Cancer. RBFN training with the KHM
algorithm shows better accuracy in classification problem.
Abstract: The paper presents an analysis of linkages and
structures of co-operation and their intensity like the potential for the
establishment of clusters in the Central and Eastern (Pannonian)
Croatian. Starting from the theoretical elaboration of the need for
entrepreneurs to organize through the cluster model and the terms of
their self-actualization, related to the importance of traditional values
in terms of benefits, social capital and assess where the company now
is, in order to prove the need to create their own identity in terms of
clustering. The institutional dimensions of social capital where the
public sector has the best role in creating the social structure of
clusters, and social dimensions of social capital in terms of trust,
cooperation and networking will be analyzed to what extent the trust
and coherency are present between companies in the Brod posavina
and Pozega slavonia County, expressed through the readiness of
inclusion in clusters in the NUTS II region - Central and Eastern
(Pannonian) Croatia, as a homogeneous economic entity, with
emphasis on limiting factors that stand in the way of greater
competitiveness.
Abstract: A clustering based technique has been developed and implemented for Short Term Load Forecasting, in this article. Formulation has been done using Mean Absolute Percentage Error (MAPE) as an objective function. Data Matrix and cluster size are optimization variables. Model designed, uses two temperature variables. This is compared with six input Radial Basis Function Neural Network (RBFNN) and Fuzzy Inference Neural Network (FINN) for the data of the same system, for same time period. The fuzzy inference system has the network structure and the training procedure of a neural network which initially creates a rule base from existing historical load data. It is observed that the proposed clustering based model is giving better forecasting accuracy as compared to the other two methods. Test results also indicate that the RBFNN can forecast future loads with accuracy comparable to that of proposed method, where as the training time required in the case of FINN is much less.
Abstract: In recent years, rapid advances in software and hardware in the field of information technology along with a digital imaging revolution in the medical domain facilitate the generation and storage of large collections of images by hospitals and clinics. To search these large image collections effectively and efficiently poses significant technical challenges, and it raises the necessity of constructing intelligent retrieval systems. Content-based Image Retrieval (CBIR) consists of retrieving the most visually similar images to a given query image from a database of images[5]. Medical CBIR (content-based image retrieval) applications pose unique challenges but at the same time offer many new opportunities. On one hand, while one can easily understand news or sports videos, a medical image is often completely incomprehensible to untrained eyes.
Abstract: Intelligent systems based on machine learning
techniques, such as classification, clustering, are gaining wide spread
popularity in real world applications. This paper presents work on
developing a software system for predicting crop yield, for example
oil-palm yield, from climate and plantation data. At the core of our
system is a method for unsupervised partitioning of data for finding
spatio-temporal patterns in climate data using kernel methods which
offer strength to deal with complex data. This work gets inspiration
from the notion that a non-linear data transformation into some high
dimensional feature space increases the possibility of linear
separability of the patterns in the transformed space. Therefore, it
simplifies exploration of the associated structure in the data. Kernel
methods implicitly perform a non-linear mapping of the input data
into a high dimensional feature space by replacing the inner products
with an appropriate positive definite function. In this paper we
present a robust weighted kernel k-means algorithm incorporating
spatial constraints for clustering the data. The proposed algorithm
can effectively handle noise, outliers and auto-correlation in the
spatial data, for effective and efficient data analysis by exploring
patterns and structures in the data, and thus can be used for
predicting oil-palm yield by analyzing various factors affecting the
yield.