Clustering Mixed Data Using Non-normal Regression Tree for Process Monitoring

In the semiconductor manufacturing process, large amounts of data are collected from various sensors of multiple facilities. The collected data from sensors have several different characteristics due to variables such as types of products, former processes and recipes. In general, Statistical Quality Control (SQC) methods assume the normality of the data to detect out-of-control states of processes. Although the collected data have different characteristics, using the data as inputs of SQC will increase variations of data, require wide control limits, and decrease performance to detect outof- control. Therefore, it is necessary to separate similar data groups from mixed data for more accurate process control. In the paper, we propose a regression tree using split algorithm based on Pearson distribution to handle non-normal distribution in parametric method. The regression tree finds similar properties of data from different variables. The experiments using real semiconductor manufacturing process data show improved performance in fault detecting ability.

An Evaluation of Algorithms for Single-Echo Biosonar Target Classification

A recent neurospiking coding scheme for feature extraction from biosonar echoes of various plants is examined with avariety of stochastic classifiers. Feature vectors derived are employedin well-known stochastic classifiers, including nearest-neighborhood,single Gaussian and a Gaussian mixture with EM optimization.Classifiers' performances are evaluated by using cross-validation and bootstrapping techniques. It is shown that the various classifers perform equivalently and that the modified preprocessing configuration yields considerably improved results.

A Model of Market Segmentation for the Customers of Mellat Bank in Iran

If organizations like Mellat Bank want to identify its customer market completely to reach its specified goals, it can segment the market to offer the product package to the right segment. Our objective is to offer a segmentation model for Iran banking market in Mellat bank view. The methodology of this project is combined by “segmentation on the basis of four part-quality variables" and “segmentation on the basis of different in means". Required data are gathered from E-Systems and researcher personal observation. Finally, the research offers the organization that at first step form a four dimensional matrix with 756 segments using four variables named value-based, behavioral, activity style, and activity level, and at the second step calculate the means of profit for every cell of matrix in two distinguished work level (levels α1:normal condition and α2: high pressure condition) and compare the segments by checking two conditions that are 1- homogeneity every segment with its sub segment and 2- heterogeneity with other segments, and so it can do the necessary segmentation process. After all, the last offer (more explained by an operational example and feedback algorithm) is to test and update the model because of dynamic environment, technology, and banking system.

Data Mining Classification Methods Applied in Drug Design

Data mining incorporates a group of statistical methods used to analyze a set of information, or a data set. It operates with models and algorithms, which are powerful tools with the great potential. They can help people to understand the patterns in certain chunk of information so it is obvious that the data mining tools have a wide area of applications. For example in the theoretical chemistry data mining tools can be used to predict moleculeproperties or improve computer-assisted drug design. Classification analysis is one of the major data mining methodologies. The aim of thecontribution is to create a classification model, which would be able to deal with a huge data set with high accuracy. For this purpose logistic regression, Bayesian logistic regression and random forest models were built using R software. TheBayesian logistic regression in Latent GOLD software was created as well. These classification methods belong to supervised learning methods. It was necessary to reduce data matrix dimension before construct models and thus the factor analysis (FA) was used. Those models were applied to predict the biological activity of molecules, potential new drug candidates.

A New Self-Adaptive EP Approach for ANN Weights Training

Evolutionary Programming (EP) represents a methodology of Evolutionary Algorithms (EA) in which mutation is considered as a main reproduction operator. This paper presents a novel EP approach for Artificial Neural Networks (ANN) learning. The proposed strategy consists of two components: the self-adaptive, which contains phenotype information and the dynamic, which is described by genotype. Self-adaptation is achieved by the addition of a value, called the network weight, which depends on a total number of hidden layers and an average number of neurons in hidden layers. The dynamic component changes its value depending on the fitness of a chromosome, exposed to mutation. Thus, the mutation step size is controlled by two components, encapsulated in the algorithm, which adjust it according to the characteristics of a predefined ANN architecture and the fitness of a particular chromosome. The comparative analysis of the proposed approach and the classical EP (Gaussian mutation) showed, that that the significant acceleration of the evolution process is achieved by using both phenotype and genotype information in the mutation strategy.

High Quality Speech Coding using Combined Parametric and Perceptual Modules

A novel approach to speech coding using the hybrid architecture is presented. Advantages of parametric and perceptual coding methods are utilized together in order to create a speech coding algorithm assuring better signal quality than in traditional CELP parametric codec. Two approaches are discussed. One is based on selection of voiced signal components that are encoded using parametric algorithm, unvoiced components that are encoded perceptually and transients that remain unencoded. The second approach uses perceptual encoding of the residual signal in CELP codec. The algorithm applied for precise transient selection is described. Signal quality achieved using the proposed hybrid codec is compared to quality of some standard speech codecs.

Detecting Subsurface Circular Objects from Low Contrast Noisy Images: Applications in Microscope Image Enhancement

Particle detection in very noisy and low contrast images is an active field of research in image processing. In this article, a method is proposed for the efficient detection and sizing of subsurface spherical particles, which is used for the processing of softly fused Au nanoparticles. Transmission Electron Microscopy is used for imaging the nanoparticles, and the proposed algorithm has been tested with the two-dimensional projected TEM images obtained. Results are compared with the data obtained by transmission optical spectroscopy, as well as with conventional circular object detection algorithms.

Orchestra/Percussion Classification Algorithm for United Speech Audio Coding System

Unified Speech Audio Coding (USAC), the latest MPEG standardization for unified speech and audio coding, uses a speech/audio classification algorithm to distinguish speech and audio segments of the input signal. The quality of the recovered audio can be increased by well-designed orchestra/percussion classification and subsequent processing. However, owing to the shortcoming of the system, introducing an orchestra/percussion classification and modifying subsequent processing can enormously increase the quality of the recovered audio. This paper proposes an orchestra/percussion classification algorithm for the USAC system which only extracts 3 scales of Mel-Frequency Cepstral Coefficients (MFCCs) rather than traditional 13 scales of MFCCs and use Iterative Dichotomiser 3 (ID3) Decision Tree rather than other complex learning method, thus the proposed algorithm has lower computing complexity than most existing algorithms. Considering that frequent changing of attributes may lead to quality loss of the recovered audio signal, this paper also design a modified subsequent process to help the whole classification system reach an accurate rate as high as 97% which is comparable to classical 99%.

A Novel Genetic Algorithm Designed for Hardware Implementation

A new genetic algorithm, termed the 'optimum individual monogenetic genetic algorithm' (OIMGA), is presented whose properties have been deliberately designed to be well suited to hardware implementation. Specific design criteria were to ensure fast access to the individuals in the population, to keep the required silicon area for hardware implementation to a minimum and to incorporate flexibility in the structure for the targeting of a range of applications. The first two criteria are met by retaining only the current optimum individual, thereby guaranteeing a small memory requirement that can easily be stored in fast on-chip memory. Also, OIMGA can be easily reconfigured to allow the investigation of problems that normally warrant either large GA populations or individuals many genes in length. Local convergence is achieved in OIMGA by retaining elite individuals, while population diversity is ensured by continually searching for the best individuals in fresh regions of the search space. The results given in this paper demonstrate that both the performance of OIMGA and its convergence time are superior to those of a range of existing hardware GA implementations.

A Novel FFT-Based Frequency Offset Estimator for OFDM Systems

This paper proposes a novel frequency offset (FO) estimator for orthogonal frequency division multiplexing. Simplicity is most significant feature of this algorithm and can be repeated to achieve acceptable accuracy. Also fractional and integer part of FO is estimated jointly with use of the same algorithm. To do so, instead of using conventional algorithms that usually use correlation function, we use DFT of received signal. Therefore, complexity will be reduced and we can do synchronization procedure by the same hardware that is used to demodulate OFDM symbol. Finally, computer simulation shows that the accuracy of this method is better than other conventional methods.

Optimization Approaches for a Complex Dairy Farm Simulation Model

This paper describes the optimization of a complex dairy farm simulation model using two quite different methods of optimization, the Genetic algorithm (GA) and the Lipschitz Branch-and-Bound (LBB) algorithm. These techniques have been used to improve an agricultural system model developed by Dexcel Limited, New Zealand, which describes a detailed representation of pastoral dairying scenarios and contains an 8-dimensional parameter space. The model incorporates the sub-models of pasture growth and animal metabolism, which are themselves complex in many cases. Each evaluation of the objective function, a composite 'Farm Performance Index (FPI)', requires simulation of at least a one-year period of farm operation with a daily time-step, and is therefore computationally expensive. The problem of visualization of the objective function (response surface) in high-dimensional spaces is also considered in the context of the farm optimization problem. Adaptations of the sammon mapping and parallel coordinates visualization are described which help visualize some important properties of the model-s output topography. From this study, it is found that GA requires fewer function evaluations in optimization than the LBB algorithm.

Clustering Categorical Data Using Hierarchies (CLUCDUH)

Clustering large populations is an important problem when the data contain noise and different shapes. A good clustering algorithm or approach should be efficient enough to detect clusters sensitively. Besides space complexity, time complexity also gains importance as the size grows. Using hierarchies we developed a new algorithm to split attributes according to the values they have and choosing the dimension for splitting so as to divide the database roughly into equal parts as much as possible. At each node we calculate some certain descriptive statistical features of the data which reside and by pruning we generate the natural clusters with a complexity of O(n).

The Negative Effect of Traditional Loops Style on the Performance of Algorithms

A new algorithm called Character-Comparison to Character-Access (CCCA) is developed to test the effect of both: 1) converting character-comparison and number-comparison into character-access and 2) the starting point of checking on the performance of the checking operation in string searching. An experiment is performed using both English text and DNA text with different sizes. The results are compared with five algorithms, namely, Naive, BM, Inf_Suf_Pref, Raita, and Cycle. With the CCCA algorithm, the results suggest that the evaluation criteria of the average number of total comparisons are improved up to 35%. Furthermore, the results suggest that the clock time required by the other algorithms is improved in range from 22.13% to 42.33% by the new CCCA algorithm.

Design of an M-Channel Cosine Modulated Filter Bank by New Cosh Window Based FIR Filters

In this paper newly reported Cosh window function is used in the design of prototype filter for M-channel Near Perfect Reconstruction (NPR) Cosine Modulated Filter Bank (CMFB). Local search optimization algorithm is used for minimization of distortion parameters by optimizing the filter coefficients of prototype filter. Design examples are presented and comparison has been made with Kaiser window based filterbank design of recently reported work. The result shows that the proposed design approach provides lower distortion parameters and improved far-end suppression than the Kaiser window based design of recent reported work.

A Survey: Clustering Ensembles Techniques

The clustering ensembles combine multiple partitions generated by different clustering algorithms into a single clustering solution. Clustering ensembles have emerged as a prominent method for improving robustness, stability and accuracy of unsupervised classification solutions. So far, many contributions have been done to find consensus clustering. One of the major problems in clustering ensembles is the consensus function. In this paper, firstly, we introduce clustering ensembles, representation of multiple partitions, its challenges and present taxonomy of combination algorithms. Secondly, we describe consensus functions in clustering ensembles including Hypergraph partitioning, Voting approach, Mutual information, Co-association based functions and Finite mixture model, and next explain their advantages, disadvantages and computational complexity. Finally, we compare the characteristics of clustering ensembles algorithms such as computational complexity, robustness, simplicity and accuracy on different datasets in previous techniques.

A New Evolutionary Algorithm for Cluster Analysis

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.

Performance Analysis of Learning Automata-Based Routing Algorithms in Sparse Graphs

A number of routing algorithms based on learning automata technique have been proposed for communication networks. How ever, there has been little work on the effects of variation of graph scarcity on the performance of these algorithms. In this paper, a comprehensive study is launched to investigate the performance of LASPA, the first learning automata based solution to the dynamic shortest path routing, across different graph structures with varying scarcities. The sensitivity of three main performance parameters of the algorithm, being average number of processed nodes, scanned edges and average time per update, to variation in graph scarcity is reported. Simulation results indicate that the LASPA algorithm can adapt well to the scarcity variation in graph structure and gives much better outputs than the existing dynamic and fixed algorithms in terms of performance criteria.

Meteorological Data Study and Forecasting Using Particle Swarm Optimization Algorithm

Weather systems use enormously complex combinations of numerical tools for study and forecasting. Unfortunately, due to phenomena in the world climate, such as the greenhouse effect, classical models may become insufficient mostly because they lack adaptation. Therefore, the weather forecast problem is matched for heuristic approaches, such as Evolutionary Algorithms. Experimentation with heuristic methods like Particle Swarm Optimization (PSO) algorithm can lead to the development of new insights or promising models that can be fine tuned with more focused techniques. This paper describes a PSO approach for analysis and prediction of data and provides experimental results of the aforementioned method on realworld meteorological time series.

An Artificial Intelligent Technique for Robust Digital Watermarking in Multiwavelet Domain

In this paper, an artificial intelligent technique for robust digital image watermarking in multiwavelet domain is proposed. The embedding technique is based on the quantization index modulation technique and the watermark extraction process does not require the original image. We have developed an optimization technique using the genetic algorithms to search for optimal quantization steps to improve the quality of watermarked image and robustness of the watermark. In addition, we construct a prediction model based on image moments and back propagation neural network to correct an attacked image geometrically before the watermark extraction process begins. The experimental results show that the proposed watermarking algorithm yields watermarked image with good imperceptibility and very robust watermark against various image processing attacks.