Feature Weighting and Selection - A Novel Genetic Evolutionary Approach

A feature weighting and selection method is proposed which uses the structure of a weightless neuron and exploits the principles that govern the operation of Genetic Algorithms and Evolution. Features are coded onto chromosomes in a novel way which allows weighting information regarding the features to be directly inferred from the gene values. The proposed method is significant in that it addresses several problems concerned with algorithms for feature selection and weighting as well as providing significant advantages such as speed, simplicity and suitability for real-time systems.

Evolutionary Approach for Automated Discovery of Censored Production Rules

In the recent past, there has been an increasing interest in applying evolutionary methods to Knowledge Discovery in Databases (KDD) and a number of successful applications of Genetic Algorithms (GA) and Genetic Programming (GP) to KDD have been demonstrated. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. The PRs, however, are unable to handle exceptions and do not exhibit variable precision. The Censored Production Rules (CPRs), an extension of PRs, were proposed by Michalski & Winston that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations, in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence are tight or there is simply no information available as to whether it holds or not. Thus, the 'If P Then D' part of the CPR expresses important information, while the Unless C part acts only as a switch and changes the polarity of D to ~D. This paper presents a classification algorithm based on evolutionary approach that discovers comprehensible rules with exceptions in the form of CPRs. The proposed approach has flexible chromosome encoding, where each chromosome corresponds to a CPR. Appropriate genetic operators are suggested and a fitness function is proposed that incorporates the basic constraints on CPRs. Experimental results are presented to demonstrate the performance of the proposed algorithm.

Solving Part Type Selection and Loading Problem in Flexible Manufacturing System Using Real Coded Genetic Algorithms – Part I: Modeling

This paper and its companion (Part 2) deal with modeling and optimization of two NP-hard problems in production planning of flexible manufacturing system (FMS), part type selection problem and loading problem. The part type selection problem and the loading problem are strongly related and heavily influence the system-s efficiency and productivity. The complexity of the problems is harder when flexibilities of operations such as the possibility of operation processed on alternative machines with alternative tools are considered. These problems have been modeled and solved simultaneously by using real coded genetic algorithms (RCGA) which uses an array of real numbers as chromosome representation. These real numbers can be converted into part type sequence and machines that are used to process the part types. This first part of the papers focuses on the modeling of the problems and discussing how the novel chromosome representation can be applied to solve the problems. The second part will discuss the effectiveness of the RCGA to solve various test bed problems.

Enhanced GA-Fuzzy OPF under both Normal and Contingent Operation States

The genetic algorithm (GA) based solution techniques are found suitable for optimization because of their ability of simultaneous multidimensional search. Many GA-variants have been tried in the past to solve optimal power flow (OPF), one of the nonlinear problems of electric power system. The issues like convergence speed and accuracy of the optimal solution obtained after number of generations using GA techniques and handling system constraints in OPF are subjects of discussion. The results obtained for GA-Fuzzy OPF on various power systems have shown faster convergence and lesser generation costs as compared to other approaches. This paper presents an enhanced GA-Fuzzy OPF (EGAOPF) using penalty factors to handle line flow constraints and load bus voltage limits for both normal network and contingency case with congestion. In addition to crossover and mutation rate adaptation scheme that adapts crossover and mutation probabilities for each generation based on fitness values of previous generations, a block swap operator is also incorporated in proposed EGA-OPF. The line flow limits and load bus voltage magnitude limits are handled by incorporating line overflow and load voltage penalty factors respectively in each chromosome fitness function. The effects of different penalty factors settings are also analyzed under contingent state.

A Multi-Level GA Search with Application to the Resource-Constrained Re-Entrant Flow Shop Scheduling Problem

Re-entrant scheduling is an important search problem with many constraints in the flow shop. In the literature, a number of approaches have been investigated from exact methods to meta-heuristics. This paper presents a genetic algorithm that encodes the problem as multi-level chromosomes to reflect the dependent relationship of the re-entrant possibility and resource consumption. The novel encoding way conserves the intact information of the data and fastens the convergence to the near optimal solutions. To test the effectiveness of the method, it has been applied to the resource-constrained re-entrant flow shop scheduling problem. Computational results show that the proposed GA performs better than the simulated annealing algorithm in the measure of the makespan

SeqWord Gene Island Sniffer: a Program to Study the Lateral Genetic Exchange among Bacteria

SeqWord Gene Island Sniffer, a new program for the identification of mobile genetic elements in sequences of bacterial chromosomes is presented. This program is based on the analysis of oligonucleotide usage variations in DNA sequences. 3,518 mobile genetic elements were identified in 637 bacterial genomes and further analyzed by sequence similarity and the functionality of encoded proteins. The results of this study are stored in an open database http://anjie.bi.up.ac.za/geidb/geidbhome. php). The developed computer program and the database provide the information valuable for further investigation of the distribution of mobile genetic elements and virulence factors among bacteria. The program is available for download at www.bi.up.ac.za/SeqWord/sniffer/index.html.

Learning Classifier Systems Approach for Automated Discovery of Crisp and Fuzzy Hierarchical Production Rules

This research presents a system for post processing of data that takes mined flat rules as input and discovers crisp as well as fuzzy hierarchical structures using Learning Classifier System approach. Learning Classifier System (LCS) is basically a machine learning technique that combines evolutionary computing, reinforcement learning, supervised or unsupervised learning and heuristics to produce adaptive systems. A LCS learns by interacting with an environment from which it receives feedback in the form of numerical reward. Learning is achieved by trying to maximize the amount of reward received. Crisp description for a concept usually cannot represent human knowledge completely and practically. In the proposed Learning Classifier System initial population is constructed as a random collection of HPR–trees (related production rules) and crisp / fuzzy hierarchies are evolved. A fuzzy subsumption relation is suggested for the proposed system and based on Subsumption Matrix (SM), a suitable fitness function is proposed. Suitable genetic operators are proposed for the chosen chromosome representation method. For implementing reinforcement a suitable reward and punishment scheme is also proposed. Experimental results are presented to demonstrate the performance of the proposed system.

An Effective Hybrid Genetic Algorithm for Job Shop Scheduling Problem

The job shop scheduling problem (JSSP) is well known as one of the most difficult combinatorial optimization problems. This paper presents a hybrid genetic algorithm for the JSSP with the objective of minimizing makespan. The efficiency of the genetic algorithm is enhanced by integrating it with a local search method. The chromosome representation of the problem is based on operations. Schedules are constructed using a procedure that generates full active schedules. In each generation, a local search heuristic based on Nowicki and Smutnicki-s neighborhood is applied to improve the solutions. The approach is tested on a set of standard instances taken from the literature and compared with other approaches. The computation results validate the effectiveness of the proposed algorithm.

Analysis of the AZF Region in Slovak Men with Azoospermia

Y chromosome microdeletions are the most common genetic cause of male infertility and screening for these microdeletions in azoospermic or severely oligospermic men is now standard practice. Analysis of the Y chromosome in men with azoospermia or severe oligozoospermia has resulted in the identification of three regions in the euchromatic part of the long arm of the human Y chromosome (Yq11) that are frequently deleted in men with otherwise unexplained spermatogenic failure. PCR analysis of microdeletions in the AZFa, AZFb and AZFc regions of the human Y chromosome is an important screening tool. The aim of this study was to analyse the type of microdeletions in men with fertility disorders in Slovakia. We evaluated 227 patients with azoospermia and with normal karyotype. All patient samples were analyzed cytogenetically. For PCR amplification of sequence-tagged sites (STS) of the AZFa, AZFb and AZFc regions of the Y chromosome was used Devyser AZF set. Fluorescently labeled primers for all markers in one multiplex PCR reaction were used and for automated visualization and identification of the STS markers we used genetic analyzer ABi 3500xl (Life Technologies). We reported 13 cases of deletions in the AZF region 5,73%. Particular types of deletions were recorded in each region AZFa,b,c .The presence of microdeletions in the AZFc region was the most frequent. The study confirmed that percentage of microdeletions in the AZF region is low in Slovak azoospermic patients, but important from a prognostic view.

A Hybrid Genetic Algorithm for the Sequence Dependent Flow-Shop Scheduling Problem

Flow-shop scheduling problem (FSP) deals with the scheduling of a set of jobs that visit a set of machines in the same order. The FSP is NP-hard, which means that an efficient algorithm for solving the problem to optimality is unavailable. To meet the requirements on time and to minimize the make-span performance of large permutation flow-shop scheduling problems in which there are sequence dependent setup times on each machine, this paper develops one hybrid genetic algorithms (HGA). Proposed HGA apply a modified approach to generate population of initial chromosomes and also use an improved heuristic called the iterated swap procedure to improve initial solutions. Also the author uses three genetic operators to make good new offspring. The results are compared to some recently developed heuristics and computational experimental results show that the proposed HGA performs very competitively with respect to accuracy and efficiency of solution.

The Design of Self-evolving Artificial Immune System II for Permutation Flow-shop Problem

Artificial Immune System is adopted as a Heuristic Algorithm to solve the combinatorial problems for decades. Nevertheless, many of these applications took advantage of the benefit for applications but seldom proposed approaches for enhancing the efficiency. In this paper, we continue the previous research to develop a Self-evolving Artificial Immune System II via coordinating the T and B cell in Immune System and built a block-based artificial chromosome for speeding up the computation time and better performance for different complexities of problems. Through the design of Plasma cell and clonal selection which are relative the function of the Immune Response. The Immune Response will help the AIS have the global and local searching ability and preventing trapped in local optima. From the experimental result, the significant performance validates the SEAIS II is effective when solving the permutation flows-hop problems.

Design of Gravity Dam by Genetic Algorithms

The design of a gravity dam is performed through an interactive process involving a preliminary layout of the structure followed by a stability and stress analysis. This study presents a method to define the optimal top width of gravity dam with genetic algorithm. To solve the optimization task (minimize the cost of the dam), an optimization routine based on genetic algorithms (GAs) was implemented into an Excel spreadsheet. It was found to perform well and GA parameters were optimized in a parametric study. Using the parameters found in the parametric study, the top width of gravity dam optimization was performed and compared to a gradient-based optimization method (classic method). The accuracy of the results was within close proximity. In optimum dam cross section, the ratio of is dam base to dam height is almost equal to 0.85, and ratio of dam top width to dam height is almost equal to 0.13. The computerized methodology may provide the help for computation of the optimal top width for a wide range of height of a gravity dam.

Genetic Folding: Analyzing the Mercer-s Kernels Effect in Support Vector Machine using Genetic Folding

Genetic Folding (GF) a new class of EA named as is introduced for the first time. It is based on chromosomes composed of floating genes structurally organized in a parent form and separated by dots. Although, the genotype/phenotype system of GF generates a kernel expression, which is the objective function of superior classifier. In this work the question of the satisfying mapping-s rules in evolving populations is addressed by analyzing populations undergoing either Mercer-s or none Mercer-s rule. The results presented here show that populations undergoing Mercer-s rules improve practically models selection of Support Vector Machine (SVM). The experiment is trained multi-classification problem and tested on nonlinear Ionosphere dataset. The target of this paper is to answer the question of evolving Mercer-s rule in SVM addressed using either genetic folding satisfied kernel-s rules or not applied to complicated domains and problems.

Statistics of Exon Lengths in Animals, Plants, Fungi, and Protists

Eukaryotic protein-coding genes are interrupted by spliceosomal introns, which are removed from the RNA transcripts before translation into a protein. The exon-intron structures of different eukaryotic species are quite different from each other, and the evolution of such structures raises many questions. We try to address some of these questions using statistical analysis of whole genomes. We go through all the protein-coding genes in a genome and study correlations between the net length of all the exons in a gene, the number of the exons, and the average length of an exon. We also take average values of these features for each chromosome and study correlations between those averages on the chromosomal level. Our data show universal features of exon-intron structures common to animals, plants, and protists (specifically, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Cryptococcus neoformans, Homo sapiens, Mus musculus, Oryza sativa, and Plasmodium falciparum). We have verified linear correlation between the number of exons in a gene and the length of a protein coded by the gene, while the protein length increases in proportion to the number of exons. On the other hand, the average length of an exon always decreases with the number of exons. Finally, chromosome clustering based on average chromosome properties and parameters of linear regression between the number of exons in a gene and the net length of those exons demonstrates that these average chromosome properties are genome-specific features.

Hybridizing Genetic Algorithm with Biased Chance Local Search

This paper explores university course timetabling problem. There are several characteristics that make scheduling and timetabling problems particularly difficult to solve: they have huge search spaces, they are often highly constrained, they require sophisticated solution representation schemes, and they usually require very time-consuming fitness evaluation routines. Thus standard evolutionary algorithms lack of efficiency to deal with them. In this paper we have proposed a memetic algorithm that incorporates the problem specific knowledge such that most of chromosomes generated are decoded into feasible solutions. Generating vast amount of feasible chromosomes makes the progress of search process possible in a time efficient manner. Experimental results exhibit the advantages of the developed Hybrid Genetic Algorithm than the standard Genetic Algorithm.

Cold Hardiness in Near Isogenic Lines of Bread Wheat (Triticum Aestivum L. em. Thell.)

Low temperature (LT) is one of the most abiotic stresses causing loss of yield in wheat (T. aestivum). Four major genes in wheat (Triticum aestivum L.) with the dominant alleles designated Vrn–A1,Vrn–B1,Vrn–D1 and Vrn4, are known to have large effects on the vernalization response, but the effects on cold hardiness are ambiguous. Poor cold tolerance has restricted winter wheat production in regions of high winter stress [9]. It was known that nearly all wheat chromosomes [5] or at least 10 chromosomes of 21 chromosome pairs are important in winter hardiness [15]. The objective of present study was to clarify the role of each chromosome in cold tolerance. With this purpose we used 20 isogenic lines of wheat. In each one of these isogenic lines only a chromosome from ‘Bezostaya’ variety (a winter habit cultivar) was substituted to ‘Capple desprez’ variety. The plant materials were planted in controlled conditions with 20º C and 16 h day length in moderately cold areas of Iran at Karaj Agricultural Research Station in 2006-07 and the acclimation period was completed for about 4 weeks in a cold room with 4º C. The cold hardiness of these isogenic lines was measured by LT50 (the temperature in which 50% of the plants are killed by freezing stress).The experimental design was completely randomized block design (RCBD)with three replicates. The results showed that chromosome 5A had a major effect on freezing tolerance, and then chromosomes 1A and 4A had less effect on this trait. Further studies are essential to understanding the importance of each chromosome in controlling cold hardiness in wheat.

Evolutionary Distance in the Yeast Genome

Whole genome duplication (WGD) increased the number of yeast Saccharomyces cerevisiae chromosomes from 8 to 16. In spite of retention the number of chromosomes in the genome of this organism after WGD to date, chromosomal rearrangement events have caused an evolutionary distance between current genome and its ancestor. Studies under evolutionary-based approaches on eukaryotic genomes have shown that the rearrangement distance is an approximable problem. In the case of S. cerevisiae, we describe that rearrangement distance is accessible by using dedoubled adjacency graph drawn for 55 large paired chromosomal regions originated from WGD. Then, we provide a program extracted from a C program database to draw a dedoubled genome adjacency graph for S. cerevisiae. From a bioinformatical perspective, using the duplicated blocks of current genome in S. cerevisiae, we infer that genomic organization of eukaryotes has the potential to provide valuable detailed information about their ancestrygenome.

Identification of Complex Sense-antisense Gene's Module on 17q11.2 Associated with Breast Cancer Aggressiveness and Patient's Survival

Sense-antisense gene pair (SAGP) is a pair of two oppositely transcribed genes sharing a common region on a chromosome. In the mammalian genomes, SAGPs can be organized in more complex sense-antisense gene architectures (CSAGA) in which at least one gene could share loci with two or more antisense partners. Many dozens of CSAGAs can be found in the human genome. However, CSAGAs have not been systematically identified and characterized in context of their role in human diseases including cancers. In this work we characterize the structural-functional properties of a cluster of 5 genes –TMEM97, IFT20, TNFAIP1, POLDIP2 and TMEM199, termed TNFAIP1 / POLDIP2 module. This cluster is organized as CSAGA in cytoband 17q11.2. Affymetrix U133A&B expression data of two large cohorts (410 atients, in total) of breast cancer patients and patient survival data were used. For the both studied cohorts, we demonstrate (i) strong and reproducible transcriptional co-regulatory patterns of genes of TNFAIP1/POLDIP2 module in breast cancer cell subtypes and (ii) significant associations of TNFAIP1/POLDIP2 CSAGA with amplification of the CSAGA region in breast cancer, (ii) cancer aggressiveness (e.g. genetic grades) and (iv) disease free patient-s survival. Moreover, gene pairs of this module demonstrate strong synergetic effect in the prognosis of time of breast cancer relapse. We suggest that TNFAIP1/ POLDIP2 cluster can be considered as a novel type of structural-functional gene modules in the human genome.

Dynamic Network Routing Method Based on Chromosome Learning

In this paper, we probe into the traffic assignment problem by the chromosome-learning-based path finding method in simulation, which is to model the driver' behavior in the with-in-a-day process. By simply making a combination and a change of the traffic route chromosomes, the driver at the intersection chooses his next route. The various crossover and mutation rules are proposed with extensive examples.

A New Algorithm to Stereo Correspondence Using Rank Transform and Morphology Based On Genetic Algorithm

This paper presents a novel algorithm of stereo correspondence with rank transform. In this algorithm we used the genetic algorithm to achieve the accurate disparity map. Genetic algorithms are efficient search methods based on principles of population genetic, i.e. mating, chromosome crossover, gene mutation, and natural selection. Finally morphology is employed to remove the errors and discontinuities.