Unsupervised Classification of DNA Barcodes Species Using Multi-Library Wavelet Networks

DNA Barcode provides good sources of needed information to classify living species. The classification problem has to be supported with reliable methods and algorithms. To analyze species regions or entire genomes, it becomes necessary to use the similarity sequence methods. A large set of sequences can be simultaneously compared using Multiple Sequence Alignment which is known to be NP-complete. However, all the used methods are still computationally very expensive and require significant computational infrastructure. Our goal is to build predictive models that are highly accurate and interpretable. In fact, our method permits to avoid the complex problem of form and structure in different classes of organisms. The empirical data and their classification performances are compared with other methods. Evenly, in this study, we present our system which is consisted of three phases. The first one, is called transformation, is composed of three sub steps; Electron-Ion Interaction Pseudopotential (EIIP) for the codification of DNA Barcodes, Fourier Transform and Power Spectrum Signal Processing. Moreover, the second phase step is an approximation; it is empowered by the use of Multi Library Wavelet Neural Networks (MLWNN). Finally, the third one, is called the classification of DNA Barcodes, is realized by applying the algorithm of hierarchical classification.

Biodegradation of Malathion by Acinetobacter baumannii Strain AFA Isolated from Domestic Sewage in Egypt

Bacterial strains capable of degradation of malathion from the domestic sewage were isolated by an enrichment culture technique. Three bacterial strains were screened and identified as Acinetobacter baumannii (AFA), Pseudomonas aeruginosa (PS1), and Pseudomonas mendocina (PS2) based on morphological, biochemical identification and 16S rRNA sequence analysis. Acinetobacter baumannii AFA was the most efficient malathion degrading bacterium, so used for further biodegradation study. AFA was able to grow in mineral salt medium (MSM) supplemented with malathion (100 mg/l) as a sole carbon source, and within 14 days, 84% of the initial dose was degraded by the isolate measured by high performance liquid chromatography. Strain AFA could also degrade other organophosphorus compounds including diazinon, chlorpyrifos and fenitrothion. The effect of different culture conditions on the degradation of malathion like inoculum density, other carbon or nitrogen sources, temperature and shaking were examined. Degradation of malathion and bacterial cell growth were accelerated when culture media were supplemented with yeast extract, glucose and citrate. The optimum conditions for malathion degradation by strain AFA were; an inoculum density of 1.5x 10^12CFU/ml at 30°C with shaking. A specific polymerase chain reaction primers were designed manually using multiple sequence alignment of the corresponding carboxylesterase enzymes of Acinetobacter species. Sequencing result of amplified PCR product and phylogenetic analysis showed low degree of homology with the other carboxylesterase enzymes of Acinetobacter strains, so we suggested that this enzyme is a novel esterase enzyme. Isolated bacterial strains may have potential role for use in bioremediation of malathion contaminated.

Vector Space of the Extended Base-triplets over the Galois Field of five DNA Bases Alphabet

A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D, G, A, U, C}, where the letter D represent one or more hypothetical bases with unspecific pairing. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvements of a primitive DNA repair system could make possible the transition from the ancient to the modern genetic code. Our results suggest that the Watson-Crick base pairing and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as the transition from the former to the later. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences.

Multiple Sequence Alignment Using Optimization Algorithms

Proteins or genes that have similar sequences are likely to perform the same function. One of the most widely used techniques for sequence comparison is sequence alignment. Sequence alignment allows mismatches and insertion/deletion, which represents biological mutations. Sequence alignment is usually performed only on two sequences. Multiple sequence alignment, is a natural extension of two-sequence alignment. In multiple sequence alignment, the emphasis is to find optimal alignment for a group of sequences. Several applicable techniques were observed in this research, from traditional method such as dynamic programming to the extend of widely used stochastic optimization method such as Genetic Algorithms (GAs) and Simulated Annealing. A framework with combination of Genetic Algorithm and Simulated Annealing is presented to solve Multiple Sequence Alignment problem. The Genetic Algorithm phase will try to find new region of solution while Simulated Annealing can be considered as an alignment improver for any near optimal solution produced by GAs.

Multiple Sequence Alignment Using Three- Dimensional Fragments

Background: Dialign is a DNA/Protein alignment tool for performing pairwise and multiple pairwise alignments through the comparison of gap-free segments (fragments) between sequence pairs. An alignment of two sequences is a chain of fragments, i.e local gap-free pairwise alignments, with the highest total score. METHOD: A new approach is defined in this article which relies on the concept of using three-dimensional fragments – i.e. local threeway alignments -- in the alignment process instead of twodimensional ones. These three-dimensional fragments are gap-free alignments constituting of equal-length segments belonging to three distinct sequences. RESULTS: The obtained results showed good improvments over the performance of DIALIGN.

Comparison of Phylogenetic Trees of Multiple Protein Sequence Alignment Methods

Multiple sequence alignment is a fundamental part in many bioinformatics applications such as phylogenetic analysis. Many alignment methods have been proposed. Each method gives a different result for the same data set, and consequently generates a different phylogenetic tree. Hence, the chosen alignment method affects the resulting tree. However in the literature, there is no evaluation of multiple alignment methods based on the comparison of their phylogenetic trees. This work evaluates the following eight aligners: ClustalX, T-Coffee, SAGA, MUSCLE, MAFFT, DIALIGN, ProbCons and Align-m, based on their phylogenetic trees (test trees) produced on a given data set. The Neighbor-Joining method is used to estimate trees. Three criteria, namely, the dNNI, the dRF and the Id_Tree are established to test the ability of different alignment methods to produce closer test tree compared to the reference one (true tree). Results show that the method which produces the most accurate alignment gives the nearest test tree to the reference tree. MUSCLE outperforms all aligners with respect to the three criteria and for all datasets, performing particularly better when sequence identities are within 10-20%. It is followed by T-Coffee at lower sequence identity (30%), trees scores of all methods become similar.