Hierarchical Clustering Analysis with SOM Networks

This work presents a neural network model for the clustering analysis of data based on Self Organizing Maps (SOM). The model evolves during the training stage towards a hierarchical structure according to the input requirements. The hierarchical structure symbolizes a specialization tool that provides refinements of the classification process. The structure behaves like a single map with different resolutions depending on the region to analyze. The benefits and performance of the algorithm are discussed in application to the Iris dataset, a classical example for pattern recognition.

Iterative Clustering Algorithm for Analyzing Temporal Patterns of Gene Expression

Microarray experiments are information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. For biologists, a key aim when analyzing microarray data is to group genes based on the temporal patterns of their expression levels. In this paper, we used an iterative clustering method to find temporal patterns of gene expression. We evaluated the performance of this method by applying it to real sporulation data and simulated data. The patterns obtained using the iterative clustering were found to be superior to those obtained using existing clustering algorithms.

Family Communication Patterns between Muslim and Santal Communities in Rural Bangladesh: A Cross-Cultural Perspective

This study compares family communication patterns in association with family socio-cultural status, especially marriage and family pattern, and couples- socio-economic status between Muslim and Santal communities in rural Bangladesh. A total of 288 couples, 145 couples from the Muslim and 143 couples from the Santal were randomly selected through cluster sampling procedure from Kalna village situated in Tanore Upazila of Rajshahi district of Bangladesh, where both the communities dwell as neighbors. In order to collect data from the selected samples, interview method with semistructural questionnaire schedule was applied. The responses given by the respondents were analyzed by Pearson-s chi-squire test and bivariate correlation techniques. The results of Pearson-s chi-squire test revealed that family communication patterns (X2= 25. 90, df= 2, p0.05) were significantly different between the Muslim and Santal communities. In addition, Spearman-s bivariate correlation coefficients suggested that among the exogenous factors, family type (rs=.135, p

Structural and Electronic Characterization of Supported Ni and Au Catalysts used in Environment Protection Determined by XRD,XAS and XPS methods

The nickel and gold nanoclusters as supported catalysts were analyzed by XAS, XRD and XPS in order to determine their local, global and electronic structure. The present study has pointed out a strong deformation of the local structure of the metal, due to its interaction with oxide supports. The average particle size, the mean squares of the microstrain, the particle size distribution and microstrain functions of the supported Ni and Au catalysts were determined by XRD method using Generalized Fermi Function for the X-ray line profiles approximation. Based on EXAFS analysis we consider that the local structure of the investigated systems is strongly distorted concerning the atomic number pairs. Metal-support interaction is confirmed by the shape changes of the probability densities of electron transitions: Ni K edge (1s → continuum and 2p), Au LIII-edge (2p3/2 → continuum, 6s, 6d5/2 and 6d3/2). XPS investigations confirm the metal-support interaction at their interface.

Improved Artificial Immune System Algorithm with Local Search

The Artificial immune systems algorithms are Meta heuristic optimization method, which are used for clustering and pattern recognition applications are abundantly. These algorithms in multimodal optimization problems are more efficient than genetic algorithms. A major drawback in these algorithms is their slow convergence to global optimum and their weak stability can be considered in various running of these algorithms. In this paper, improved Artificial Immune System Algorithm is introduced for the first time to overcome its problems of artificial immune system. That use of the small size of a local search around the memory antibodies is used for improving the algorithm efficiently. The credibility of the proposed approach is evaluated by simulations, and it is shown that the proposed approach achieves better results can be achieved compared to the standard artificial immune system algorithms

Sustainable Solutions for Municipal Solid Waste Management in Thailand

General as well as the MSW management in Thailand is reviewed in this paper. Topics include the MSW generation, sources, composition, and trends. The review, then, moves to sustainable solutions for MSW management, sustainable alternative approaches with an emphasis on an integrated MSW management. Information of waste in Thailand is also given at the beginning of this paper for better understanding of later contents. It is clear that no one single method of MSW disposal can deal with all materials in an environmentally sustainable way. As such, a suitable approach in MSW management should be an integrated approach that could deliver both environmental and economic sustainability. With increasing environmental concerns, the integrated MSW management system has a potential to maximize the useable waste materials as well as produce energy as a by-product. In Thailand, the compositions of waste (86%) are mainly organic waste, paper, plastic, glass, and metal. As a result, the waste in Thailand is suitable for an integrated MSW management. Currently, the Thai national waste management policy starts to encourage the local administrations to gather into clusters to establish central MSW disposal facilities with suitable technologies and reducing the disposal cost based on the amount of MSW generated.

Dual-Link Hierarchical Cluster-Based Interconnect Architecture for 3D Network on Chip

Network on Chip (NoC) has emerged as a promising on chip communication infrastructure. Three Dimensional Integrate Circuit (3D IC) provides small interconnection length between layers and the interconnect scalability in the third dimension, which can further improve the performance of NoC. Therefore, in this paper, a hierarchical cluster-based interconnect architecture is merged with the 3D IC. This interconnect architecture significantly reduces the number of long wires. Since this architecture only has approximately a quarter of routers in 3D mesh-based architecture, the average number of hops is smaller, which leads to lower latency and higher throughput. Moreover, smaller number of routers decreases the area overhead. Meanwhile, some dual links are inserted into the bottlenecks of communication to improve the performance of NoC. Simulation results demonstrate our theoretical analysis and show the advantages of our proposed architecture in latency, throughput and area, when compared with 3D mesh-based architecture.

An Energy-Efficient Distributed Unequal Clustering Protocol for Wireless Sensor Networks

The wireless sensor networks have been extensively deployed and researched. One of the major issues in wireless sensor networks is a developing energy-efficient clustering protocol. Clustering algorithm provides an effective way to prolong the lifetime of a wireless sensor networks. In the paper, we compare several clustering protocols which significantly affect a balancing of energy consumption. And we propose an Energy-Efficient Distributed Unequal Clustering (EEDUC) algorithm which provides a new way of creating distributed clusters. In EEDUC, each sensor node sets the waiting time. This waiting time is considered as a function of residual energy, number of neighborhood nodes. EEDUC uses waiting time to distribute cluster heads. We also propose an unequal clustering mechanism to solve the hot-spot problem. Simulation results show that EEDUC distributes the cluster heads, balances the energy consumption well among the cluster heads and increases the network lifetime.

Visualization and Indexing of Spectral Databases

On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.

Density Clustering Based On Radius of Data (DCBRD)

Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, a density based clustering algorithm (DCBRD) is presented, relying on a knowledge acquired from the data by dividing the data space into overlapped regions. The proposed algorithm discovers arbitrary shaped clusters, requires no input parameters and uses the same definitions of DBSCAN algorithm. We performed an experimental evaluation of the effectiveness and efficiency of it, and compared this results with that of DBSCAN. The results of our experiments demonstrate that the proposed algorithm is significantly efficient in discovering clusters of arbitrary shape and size.

Queen-bee Algorithm for Energy Efficient Clusters in Wireless Sensor Networks

Wireless sensor networks include small nodes which have sensing ability; calculation and connection extend themselves everywhere soon. Such networks have source limitation on connection, calculation and energy consumption. So, since the nodes have limited energy in sensor networks, the optimized energy consumption in these networks is of more importance and has created many challenges. The previous works have shown that by organizing the network nodes in a number of clusters, the energy consumption could be reduced considerably. So the lifetime of the network would be increased. In this paper, we used the Queen-bee algorithm to create energy efficient clusters in wireless sensor networks. The Queen-bee (QB) is similar to nature in that the queen-bee plays a major role in reproduction process. The QB is simulated with J-sim simulator. The results of the simulation showed that the clustering by the QB algorithm decreases the energy consumption with regard to the other existing algorithms and increases the lifetime of the network.

2D Gabor Functions and FCMI Algorithm for Flaws Detection in Ultrasonic Images

In this paper we present a new approach to detecting a flaw in T.O.F.D (Time Of Flight Diffraction) type ultrasonic image based on texture features. Texture is one of the most important features used in recognizing patterns in an image. The paper describes texture features based on 2D Gabor functions, i.e., Gaussian shaped band-pass filters, with dyadic treatment of the radial spatial frequency range and multiple orientations, which represent an appropriate choice for tasks requiring simultaneous measurement in both space and frequency domains. The most relevant features are used as input data on a Fuzzy c-mean clustering classifier. The classes that exist are only two: 'defects' or 'no defects'. The proposed approach is tested on the T.O.F.D image achieved at the laboratory and on the industrial field.

Enhanced Clustering Analysis and Visualization Using Kohonen's Self-Organizing Feature Map Networks

Cluster analysis is the name given to a diverse collection of techniques that can be used to classify objects (e.g. individuals, quadrats, species etc). While Kohonen's Self-Organizing Feature Map (SOFM) or Self-Organizing Map (SOM) networks have been successfully applied as a classification tool to various problem domains, including speech recognition, image data compression, image or character recognition, robot control and medical diagnosis, its potential as a robust substitute for clustering analysis remains relatively unresearched. SOM networks combine competitive learning with dimensionality reduction by smoothing the clusters with respect to an a priori grid and provide a powerful tool for data visualization. In this paper, SOM is used for creating a toroidal mapping of two-dimensional lattice to perform cluster analysis on results of a chemical analysis of wines produced in the same region in Italy but derived from three different cultivators, referred to as the “wine recognition data" located in the University of California-Irvine database. The results are encouraging and it is believed that SOM would make an appealing and powerful decision-support system tool for clustering tasks and for data visualization.

A Survey of Business Component Identification Methods and Related Techniques

With deep development of software reuse, componentrelated technologies have been widely applied in the development of large-scale complex applications. Component identification (CI) is one of the primary research problems in software reuse, by analyzing domain business models to get a set of business components with high reuse value and good reuse performance to support effective reuse. Based on the concept and classification of CI, its technical stack is briefly discussed from four views, i.e., form of input business models, identification goals, identification strategies, and identification process. Then various CI methods presented in literatures are classified into four types, i.e., domain analysis based methods, cohesion-coupling based clustering methods, CRUD matrix based methods, and other methods, with the comparisons between these methods for their advantages and disadvantages. Additionally, some insufficiencies of study on CI are discussed, and the causes are explained subsequently. Finally, it is concluded with some significantly promising tendency about research on this problem.

A New Hybrid RMN Image Segmentation Algorithm

The development of aid's systems for the medical diagnosis is not easy thing because of presence of inhomogeneities in the MRI, the variability of the data from a sequence to the other as well as of other different source distortions that accentuate this difficulty. A new automatic, contextual, adaptive and robust segmentation procedure by MRI brain tissue classification is described in this article. A first phase consists in estimating the density of probability of the data by the Parzen-Rozenblatt method. The classification procedure is completely automatic and doesn't make any assumptions nor on the clusters number nor on the prototypes of these clusters since these last are detected in an automatic manner by an operator of mathematical morphology called skeleton by influence zones detection (SKIZ). The problem of initialization of the prototypes as well as their number is transformed in an optimization problem; in more the procedure is adaptive since it takes in consideration the contextual information presents in every voxel by an adaptive and robust non parametric model by the Markov fields (MF). The number of bad classifications is reduced by the use of the criteria of MPM minimization (Maximum Posterior Marginal).

Sample-Weighted Fuzzy Clustering with Regularizations

Although there have been many researches in cluster analysis to consider on feature weights, little effort is made on sample weights. Recently, Yu et al. (2011) considered a probability distribution over a data set to represent its sample weights and then proposed sample-weighted clustering algorithms. In this paper, we give a sample-weighted version of generalized fuzzy clustering regularization (GFCR), called the sample-weighted GFCR (SW-GFCR). Some experiments are considered. These experimental results and comparisons demonstrate that the proposed SW-GFCR is more effective than the most clustering algorithms.

Validation Testing for Temporal Neural Networks for RBF Recognition

A neuron can emit spikes in an irregular time basis and by averaging over a certain time window one would ignore a lot of information. It is known that in the context of fast information processing there is no sufficient time to sample an average firing rate of the spiking neurons. The present work shows that the spiking neurons are capable of computing the radial basis functions by storing the relevant information in the neurons' delays. One of the fundamental findings of the this research also is that when using overlapping receptive fields to encode the data patterns it increases the network-s clustering capacity. The clustering algorithm that is discussed here is interesting from computer science and neuroscience point of view as well as from a perspective.

Neural Network based Texture Analysis of Liver Tumor from Computed Tomography Images

Advances in clinical medical imaging have brought about the routine production of vast numbers of medical images that need to be analyzed. As a result an enormous amount of computer vision research effort has been targeted at achieving automated medical image analysis. Computed Tomography (CT) is highly accurate for diagnosing liver tumors. This study aimed to evaluate the potential role of the wavelet and the neural network in the differential diagnosis of liver tumors in CT images. The tumors considered in this study are hepatocellular carcinoma, cholangio carcinoma, hemangeoma and hepatoadenoma. Each suspicious tumor region was automatically extracted from the CT abdominal images and the textural information obtained was used to train the Probabilistic Neural Network (PNN) to classify the tumors. Results obtained were evaluated with the help of radiologists. The system differentiates the tumor with relatively high accuracy and is therefore clinically useful.

Statistics of Exon Lengths in Animals, Plants, Fungi, and Protists

Eukaryotic protein-coding genes are interrupted by spliceosomal introns, which are removed from the RNA transcripts before translation into a protein. The exon-intron structures of different eukaryotic species are quite different from each other, and the evolution of such structures raises many questions. We try to address some of these questions using statistical analysis of whole genomes. We go through all the protein-coding genes in a genome and study correlations between the net length of all the exons in a gene, the number of the exons, and the average length of an exon. We also take average values of these features for each chromosome and study correlations between those averages on the chromosomal level. Our data show universal features of exon-intron structures common to animals, plants, and protists (specifically, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Cryptococcus neoformans, Homo sapiens, Mus musculus, Oryza sativa, and Plasmodium falciparum). We have verified linear correlation between the number of exons in a gene and the length of a protein coded by the gene, while the protein length increases in proportion to the number of exons. On the other hand, the average length of an exon always decreases with the number of exons. Finally, chromosome clustering based on average chromosome properties and parameters of linear regression between the number of exons in a gene and the net length of those exons demonstrates that these average chromosome properties are genome-specific features.

A Distributed Algorithm for Intrinsic Cluster Detection over Large Spatial Data

Clustering algorithms help to understand the hidden information present in datasets. A dataset may contain intrinsic and nested clusters, the detection of which is of utmost importance. This paper presents a Distributed Grid-based Density Clustering algorithm capable of identifying arbitrary shaped embedded clusters as well as multi-density clusters over large spatial datasets. For handling massive datasets, we implemented our method using a 'sharednothing' architecture where multiple computers are interconnected over a network. Experimental results are reported to establish the superiority of the technique in terms of scale-up, speedup as well as cluster quality.