Performance Analysis of Genetic Algorithm with kNN and SVM for Feature Selection in Tumor Classification

Tumor classification is a key area of research in the field of bioinformatics. Microarray technology is commonly used in the study of disease diagnosis using gene expression levels. The main drawback of gene expression data is that it contains thousands of genes and a very few samples. Feature selection methods are used to select the informative genes from the microarray. These methods considerably improve the classification accuracy. In the proposed method, Genetic Algorithm (GA) is used for effective feature selection. Informative genes are identified based on the T-Statistics, Signal-to-Noise Ratio (SNR) and F-Test values. The initial candidate solutions of GA are obtained from top-m informative genes. The classification accuracy of k-Nearest Neighbor (kNN) method is used as the fitness function for GA. In this work, kNN and Support Vector Machine (SVM) are used as the classifiers. The experimental results show that the proposed work is suitable for effective feature selection. With the help of the selected genes, GA-kNN method achieves 100% accuracy in 4 datasets and GA-SVM method achieves in 5 out of 10 datasets. The GA with kNN and SVM methods are demonstrated to be an accurate method for microarray based tumor classification.

Discovery of Quantified Hierarchical Production Rules from Large Set of Discovered Rules

Automated discovery of Rule is, due to its applicability, one of the most fundamental and important method in KDD. It has been an active research area in the recent past. Hierarchical representation allows us to easily manage the complexity of knowledge, to view the knowledge at different levels of details, and to focus our attention on the interesting aspects only. One of such efficient and easy to understand systems is Hierarchical Production rule (HPRs) system. A HPR, a standard production rule augmented with generality and specificity information, is of the following form: Decision If < condition> Generality Specificity . HPRs systems are capable of handling taxonomical structures inherent in the knowledge about the real world. This paper focuses on the issue of mining Quantified rules with crisp hierarchical structure using Genetic Programming (GP) approach to knowledge discovery. The post-processing scheme presented in this work uses Quantified production rules as initial individuals of GP and discovers hierarchical structure. In proposed approach rules are quantified by using Dempster Shafer theory. Suitable genetic operators are proposed for the suggested encoding. Based on the Subsumption Matrix(SM), an appropriate fitness function is suggested. Finally, Quantified Hierarchical Production Rules (HPRs) are generated from the discovered hierarchy, using Dempster Shafer theory. Experimental results are presented to demonstrate the performance of the proposed algorithm.

Performance Evaluation of Qos Parameters in Cognitive Radio Using Genetic Algorithm

The efficient use of available licensed spectrum is becoming more and more critical with increasing demand and usage of the radio spectrum. This paper shows how the use of spectrum as well as dynamic spectrum management can be effectively managed and spectrum allocation schemes in the wireless communication systems be implemented and used, in future. This paper would be an attempt towards better utilization of the spectrum. This research will focus on the decision-making process mainly, with an assumption that the radio environment has already been sensed and the QoS requirements for the application have been specified either by the sensed radio environment or by the secondary user itself. We identify and study the characteristic parameters of Cognitive Radio and use Genetic Algorithm for spectrum allocation. Performance evaluation is done using MATLAB toolboxes.

Genetic Programming Approach to Hierarchical Production Rule Discovery

Automated discovery of hierarchical structures in large data sets has been an active research area in the recent past. This paper focuses on the issue of mining generalized rules with crisp hierarchical structure using Genetic Programming (GP) approach to knowledge discovery. The post-processing scheme presented in this work uses flat rules as initial individuals of GP and discovers hierarchical structure. Suitable genetic operators are proposed for the suggested encoding. Based on the Subsumption Matrix(SM), an appropriate fitness function is suggested. Finally, Hierarchical Production Rules (HPRs) are generated from the discovered hierarchy. Experimental results are presented to demonstrate the performance of the proposed algorithm.

Intuition Operator: Providing Genomes with Reason

In this contribution, the use of a new genetic operator is proposed. The main advantage of using this operator is that it is able to assist the evolution procedure to converge faster towards the optimal solution of a problem. This new genetic operator is called ''intuition'' operator. Generally speaking, one can claim that this operator is a way to include any heuristic or any other local knowledge, concerning the problem, that cannot be embedded in the fitness function. Simulation results show that the use of this operator increases significantly the performance of the classic Genetic Algorithm by increasing the convergence speed of its population.

Using Genetic Algorithm to Improve Information Retrieval Systems

This study investigates the use of genetic algorithms in information retrieval. The method is shown to be applicable to three well-known documents collections, where more relevant documents are presented to users in the genetic modification. In this paper we present a new fitness function for approximate information retrieval which is very fast and very flexible, than cosine similarity fitness function.

A Hybrid Approach for Selection of Relevant Features for Microarray Datasets

Developing an accurate classifier for high dimensional microarray datasets is a challenging task due to availability of small sample size. Therefore, it is important to determine a set of relevant genes that classify the data well. Traditionally, gene selection method often selects the top ranked genes according to their discriminatory power. Often these genes are correlated with each other resulting in redundancy. In this paper, we have proposed a hybrid method using feature ranking and wrapper method (Genetic Algorithm with multiclass SVM) to identify a set of relevant genes that classify the data more accurately. A new fitness function for genetic algorithm is defined that focuses on selecting the smallest set of genes that provides maximum accuracy. Experiments have been carried on four well-known datasets1. The proposed method provides better results in comparison to the results found in the literature in terms of both classification accuracy and number of genes selected.

A New Method for Identifying Broken Rotor Bars in Squirrel Cage Induction Motor Based on Particle Swarm Optimization Method

Detection of squirrel cage induction motor (SCIM) broken bars has long been an important but difficult job in the detection area of motor faults. Early detection of this abnormality in the motor would help to avoid costly breakdowns. A new detection method based on particle swarm optimization (PSO) is presented in this paper. Stator current in an induction motor will be measured and characteristic frequency components of faylted rotor will be detected by minimizing a fitness function using pso. Supply frequency and side band frequencies and their amplitudes can be estimated by the proposed method. The proposed method is applied to a faulty motor with one and two broken bars in different loading condition. Experimental results prove that the proposed method is effective and applicable.

Learning Classifier Systems Approach for Automated Discovery of Crisp and Fuzzy Hierarchical Production Rules

This research presents a system for post processing of data that takes mined flat rules as input and discovers crisp as well as fuzzy hierarchical structures using Learning Classifier System approach. Learning Classifier System (LCS) is basically a machine learning technique that combines evolutionary computing, reinforcement learning, supervised or unsupervised learning and heuristics to produce adaptive systems. A LCS learns by interacting with an environment from which it receives feedback in the form of numerical reward. Learning is achieved by trying to maximize the amount of reward received. Crisp description for a concept usually cannot represent human knowledge completely and practically. In the proposed Learning Classifier System initial population is constructed as a random collection of HPR–trees (related production rules) and crisp / fuzzy hierarchies are evolved. A fuzzy subsumption relation is suggested for the proposed system and based on Subsumption Matrix (SM), a suitable fitness function is proposed. Suitable genetic operators are proposed for the chosen chromosome representation method. For implementing reinforcement a suitable reward and punishment scheme is also proposed. Experimental results are presented to demonstrate the performance of the proposed system.

A Simplified and Effective Algorithm Used to Mine Similar Processes: An Illustrated Example

The running logs of a process hold valuable information about its executed activity behavior and generated activity logic structure. Theses informative logs can be extracted, analyzed and utilized to improve the efficiencies of the process's execution and conduction. One of the techniques used to accomplish the process improvement is called as process mining. To mine similar processes is such an improvement mission in process mining. Rather than directly mining similar processes using a single comparing coefficient or a complicate fitness function, this paper presents a simplified heuristic process mining algorithm with two similarity comparisons that are able to relatively conform the activity logic sequences (traces) of mining processes with those of a normalized (regularized) one. The relative process conformance is to find which of the mining processes match the required activity sequences and relationships, further for necessary and sufficient applications of the mined processes to process improvements. One similarity presented is defined by the relationships in terms of the number of similar activity sequences existing in different processes; another similarity expresses the degree of the similar (identical) activity sequences among the conforming processes. Since these two similarities are with respect to certain typical behavior (activity sequences) occurred in an entire process, the common problems, such as the inappropriateness of an absolute comparison and the incapability of an intrinsic information elicitation, which are often appeared in other process conforming techniques, can be solved by the relative process comparison presented in this paper. To demonstrate the potentiality of the proposed algorithm, a numerical example is illustrated.

Surrogate based Evolutionary Algorithm for Design Optimization

Optimization is often a critical issue for most system design problems. Evolutionary Algorithms are population-based, stochastic search techniques, widely used as efficient global optimizers. However, finding optimal solution to complex high dimensional, multimodal problems often require highly computationally expensive function evaluations and hence are practically prohibitive. The Dynamic Approximate Fitness based Hybrid EA (DAFHEA) model presented in our earlier work [14] reduced computation time by controlled use of meta-models to partially replace the actual function evaluation by approximate function evaluation. However, the underlying assumption in DAFHEA is that the training samples for the meta-model are generated from a single uniform model. Situations like model formation involving variable input dimensions and noisy data certainly can not be covered by this assumption. In this paper we present an enhanced version of DAFHEA that incorporates a multiple-model based learning approach for the SVM approximator. DAFHEA-II (the enhanced version of the DAFHEA framework) also overcomes the high computational expense involved with additional clustering requirements of the original DAFHEA framework. The proposed framework has been tested on several benchmark functions and the empirical results illustrate the advantages of the proposed technique.

Simulation of Robotic Arm using Genetic Algorithm and AHP

In this paper, we have proposed a low cost optimized solution for the movement of a three-arm manipulator using Genetic Algorithm (GA) and Analytical Hierarchy Process (AHP). A scheme is given for optimizing the movement of robotic arm with the help of Genetic Algorithm so that the minimum energy consumption criteria can be achieved. As compared to Direct Kinematics, Inverse Kinematics evolved two solutions out of which the best-fit solution is selected with the help of Genetic Algorithm and is kept in search space for future use. The Inverse Kinematics, Fitness Value evaluation and Binary Encoding like tasks are simulated and tested. Although, three factors viz. Movement, Friction and Least Settling Time (or Min. Vibration) are used for finding the Fitness Function / Fitness Values, however some more factors can also be considered.

A Comparison between Heuristic and Meta-Heuristic Methods for Solving the Multiple Traveling Salesman Problem

The multiple traveling salesman problem (mTSP) can be used to model many practical problems. The mTSP is more complicated than the traveling salesman problem (TSP) because it requires determining which cities to assign to each salesman, as well as the optimal ordering of the cities within each salesman's tour. Previous studies proposed that Genetic Algorithm (GA), Integer Programming (IP) and several neural network (NN) approaches could be used to solve mTSP. This paper compared the results for mTSP, solved with Genetic Algorithm (GA) and Nearest Neighbor Algorithm (NNA). The number of cities is clustered into a few groups using k-means clustering technique. The number of groups depends on the number of salesman. Then, each group is solved with NNA and GA as an independent TSP. It is found that k-means clustering and NNA are superior to GA in terms of performance (evaluated by fitness function) and computing time.

A Multi-objective Fuzzy Optimization Method of Resource Input Based on Genetic Algorithm

With the increasing complexity of engineering problems, the traditional, single-objective and deterministic optimization method can not meet people-s requirements. A multi-objective fuzzy optimization model of resource input is built for M chlor-alkali chemical eco-industrial park in this paper. First, the model is changed into the form that can be solved by genetic algorithm using fuzzy theory. And then, a fitness function is constructed for genetic algorithm. Finally, a numerical example is presented to show that the method compared with traditional single-objective optimization method is more practical and efficient.

FIR Filter Design via Linear Complementarity Problem, Messy Genetic Algorithm, and Ising Messy Genetic Algorithm

In this paper the design of maximally flat linear phase finite impulse response (FIR) filters is considered. The problem is handled with totally two different approaches. The first one is completely deterministic numerical approach where the problem is formulated as a Linear Complementarity Problem (LCP). The other one is based on a combination of Markov Random Fields (MRF's) approach with messy genetic algorithm (MGA). Markov Random Fields (MRFs) are a class of probabilistic models that have been applied for many years to the analysis of visual patterns or textures. Our objective is to establish MRFs as an interesting approach to modeling messy genetic algorithms. We establish a theoretical result that every genetic algorithm problem can be characterized in terms of a MRF model. This allows us to construct an explicit probabilistic model of the MGA fitness function and introduce the Ising MGA. Experimentations done with Ising MGA are less costly than those done with standard MGA since much less computations are involved. The least computations of all is for the LCP. Results of the LCP, random search, random seeded search, MGA, and Ising MGA are discussed.

Genetic Algorithm Parameters Optimization for Bi-Criteria Multiprocessor Task Scheduling Using Design of Experiments

Multiprocessor task scheduling is a NP-hard problem and Genetic Algorithm (GA) has been revealed as an excellent technique for finding an optimal solution. In the past, several methods have been considered for the solution of this problem based on GAs. But, all these methods consider single criteria and in the present work, minimization of the bi-criteria multiprocessor task scheduling problem has been considered which includes weighted sum of makespan & total completion time. Efficiency and effectiveness of genetic algorithm can be achieved by optimization of its different parameters such as crossover, mutation, crossover probability, selection function etc. The effects of GA parameters on minimization of bi-criteria fitness function and subsequent setting of parameters have been accomplished by central composite design (CCD) approach of response surface methodology (RSM) of Design of Experiments. The experiments have been performed with different levels of GA parameters and analysis of variance has been performed for significant parameters for minimisation of makespan and total completion time simultaneously.

Synthesis of Logic Circuits Using Fractional-Order Dynamic Fitness Functions

This paper analyses the performance of a genetic algorithm using a new concept, namely a fractional-order dynamic fitness function, for the synthesis of combinational logic circuits. The experiments reveal superior results in terms of speed and convergence to achieve a solution.

Construct Pairwise Test Suites Based on the Bak-Sneppen Model of Biological Evolution

Pairwise testing, which requires that every combination of valid values of each pair of system factors be covered by at lease one test case, plays an important role in software testing since many faults are caused by unexpected 2-way interactions among system factors. Although meta-heuristic strategies like simulated annealing can generally discover smaller pairwise test suite, they may cost more time to perform search, compared with greedy algorithms. We propose a new method, improved Extremal Optimization (EO) based on the Bak-Sneppen (BS) model of biological evolution, for constructing pairwise test suites and define fitness function according to the requirement of improved EO. Experimental results show that improved EO gives similar size of resulting pairwise test suite and yields an 85% reduction in solution time over SA.

Learning Classifier Systems Approach for Automated Discovery of Censored Production Rules

In the recent past Learning Classifier Systems have been successfully used for data mining. Learning Classifier System (LCS) is basically a machine learning technique which combines evolutionary computing, reinforcement learning, supervised or unsupervised learning and heuristics to produce adaptive systems. A LCS learns by interacting with an environment from which it receives feedback in the form of numerical reward. Learning is achieved by trying to maximize the amount of reward received. All LCSs models more or less, comprise four main components; a finite population of condition–action rules, called classifiers; the performance component, which governs the interaction with the environment; the credit assignment component, which distributes the reward received from the environment to the classifiers accountable for the rewards obtained; the discovery component, which is responsible for discovering better rules and improving existing ones through a genetic algorithm. The concatenate of the production rules in the LCS form the genotype, and therefore the GA should operate on a population of classifier systems. This approach is known as the 'Pittsburgh' Classifier Systems. Other LCS that perform their GA at the rule level within a population are known as 'Mitchigan' Classifier Systems. The most predominant representation of the discovered knowledge is the standard production rules (PRs) in the form of IF P THEN D. The PRs, however, are unable to handle exceptions and do not exhibit variable precision. The Censored Production Rules (CPRs), an extension of PRs, were proposed by Michalski and Winston that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: IF P THEN D UNLESS C, where Censor C is an exception to the rule. Such rules are employed in situations, in which conditional statement IF P THEN D holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence are tight or there is simply no information available as to whether it holds or not. Thus, the IF P THEN D part of CPR expresses important information, while the UNLESS C part acts only as a switch and changes the polarity of D to ~D. In this paper Pittsburgh style LCSs approach is used for automated discovery of CPRs. An appropriate encoding scheme is suggested to represent a chromosome consisting of fixed size set of CPRs. Suitable genetic operators are designed for the set of CPRs and individual CPRs and also appropriate fitness function is proposed that incorporates basic constraints on CPR. Experimental results are presented to demonstrate the performance of the proposed learning classifier system.

Distortion Estimation in Digital Image Watermarking using Genetic Programming

This paper introduces a technique of distortion estimation in image watermarking using Genetic Programming (GP). The distortion is estimated by considering the problem of obtaining a distorted watermarked signal from the original watermarked signal as a function regression problem. This function regression problem is solved using GP, where the original watermarked signal is considered as an independent variable. GP-based distortion estimation scheme is checked for Gaussian attack and Jpeg compression attack. We have used Gaussian attacks of different strengths by changing the standard deviation. JPEG compression attack is also varied by adding various distortions. Experimental results demonstrate that the proposed technique is able to detect the watermark even in the case of strong distortions and is more robust against attacks.