MIM: A Species Independent Approach for Classifying Coding and Non-Coding DNA Sequences in Bacterial and Archaeal Genomes

A number of competing methodologies have been developed to identify genes and classify DNA sequences into coding and non-coding sequences. This classification process is fundamental in gene finding and gene annotation tools and is one of the most challenging tasks in bioinformatics and computational biology. An information theory measure based on mutual information has shown good accuracy in classifying DNA sequences into coding and noncoding. In this paper we describe a species independent iterative approach that distinguishes coding from non-coding sequences using the mutual information measure (MIM). A set of sixty prokaryotes is used to extract universal training data. To facilitate comparisons with the published results of other researchers, a test set of 51 bacterial and archaeal genomes was used to evaluate MIM. These results demonstrate that MIM produces superior results while remaining species independent.

Detection of Legionella pneumophila in Cooling Water Systems of Hospitals and Nursing Homes of Kerman City, Iran by Semi- Nested PCR

Legionella pneumophila is involved in more than 95% cases of severe atypical pneumonia. Infection is mainly by inhalation the indoor aerosols through the water-coolant systems. Because some Legionella strains may be viable but not culturable, therefore, Taq polymerase, DNA amplification and semi-nested-PCR were carried out to detect Legionella-specific 16S-rDNA sequence. For this purpose, 1.5 litter of water samples from 77 water-coolant system were collected from four different hospitals, two nursing homes and one student hostel in Kerman city of Iran, each in a brand new plastic bottle during summer season of 2006 (from April to August). The samples were filtered in the sterile condition through the Millipore Membrane Filter. DNA was extracted from membrane and used for PCR to detect Legionella spp. The PCR product was then subjected to semi-nested PCR for detection of L. pneumophila. Out of 77 water samples that were tested by PCR, 30 (39%) were positive for most species of Legionella. However, L. pneumophila was detected from 14 (18.2%) water samples by semi-nested PCR. From the above results it can be concluded that water coolant systems of different hospitals and nursing homes in Kerman city of Iran are highly contaminated with L. pneumophila spp. and pose serious concern. So, we recommend avoiding such type of coolant system in the hospitals and nursing homes.

SeqWord Gene Island Sniffer: a Program to Study the Lateral Genetic Exchange among Bacteria

SeqWord Gene Island Sniffer, a new program for the identification of mobile genetic elements in sequences of bacterial chromosomes is presented. This program is based on the analysis of oligonucleotide usage variations in DNA sequences. 3,518 mobile genetic elements were identified in 637 bacterial genomes and further analyzed by sequence similarity and the functionality of encoded proteins. The results of this study are stored in an open database http://anjie.bi.up.ac.za/geidb/geidbhome. php). The developed computer program and the database provide the information valuable for further investigation of the distribution of mobile genetic elements and virulence factors among bacteria. The program is available for download at www.bi.up.ac.za/SeqWord/sniffer/index.html.

An Ant-based Clustering System for Knowledge Discovery in DNA Chip Analysis Data

Biological data has several characteristics that strongly differentiate it from typical business data. It is much more complex, usually large in size, and continuously changes. Until recently business data has been the main target for discovering trends, patterns or future expectations. However, with the recent rise in biotechnology, the powerful technology that was used for analyzing business data is now being applied to biological data. With the advanced technology at hand, the main trend in biological research is rapidly changing from structural DNA analysis to understanding cellular functions of the DNA sequences. DNA chips are now being used to perform experiments and DNA analysis processes are being used by researchers. Clustering is one of the important processes used for grouping together similar entities. There are many clustering algorithms such as hierarchical clustering, self-organizing maps, K-means clustering and so on. In this paper, we propose a clustering algorithm that imitates the ecosystem taking into account the features of biological data. We implemented the system using an Ant-Colony clustering algorithm. The system decides the number of clusters automatically. The system processes the input biological data, runs the Ant-Colony algorithm, draws the Topic Map, assigns clusters to the genes and displays the output. We tested the algorithm with a test data of 100 to1000 genes and 24 samples and show promising results for applying this algorithm to clustering DNA chip data.

A Simulation Software for DNA Computing Algorithms Implementation

The capturing of gel electrophoresis image represents the output of a DNA computing algorithm. Before this image is being captured, DNA computing involves parallel overlap assembly (POA) and polymerase chain reaction (PCR) that is the main of this computing algorithm. However, the design of the DNA oligonucleotides to represent a problem is quite complicated and is prone to errors. In order to reduce these errors during the design stage before the actual in-vitro experiment is carried out; a simulation software capable of simulating the POA and PCR processes is developed. This simulation software capability is unlimited where problem of any size and complexity can be simulated, thus saving cost due to possible errors during the design process. Information regarding the DNA sequence during the computing process as well as the computing output can be extracted at the same time using the simulation software.

A New Edit Distance Method for Finding Similarity in Dna Sequence

The P-Bigram method is a string comparison methods base on an internal two characters-based similarity measure. The edit distance between two strings is the minimal number of elementary editing operations required to transform one string into the other. The elementary editing operations include deletion, insertion, substitution two characters. In this paper, we address the P-Bigram method to sole the similarity problem in DNA sequence. This method provided an efficient algorithm that locates all minimum operation in a string. We have been implemented algorithm and found that our program calculated that smaller distance than one string. We develop PBigram edit distance and show that edit distance or the similarity and implementation using dynamic programming. The performance of the proposed approach is evaluated using number edit and percentage similarity measures.

Rice cDNA Encoding PROLM is Capable of Rescuing Salt Sensitive Yeast Phenotypes G19 and Axt3K from Salt Stress

Rice seed expression (cDNA) library in the Lambda Zap 11® phage constructed from the developing grain 10-20 days after flowering was transformed into yeast for functional complementation assays in three salt sensitive yeast mutants S. cerevisiae strain CY162, G19 and Axt3K. Transformed cells of G19 and Axt3K with pYES vector with cDNA inserts showed enhance tolerance than those with empty pYes vector. Sequencing of the cDNA inserts revealed that they encode for the putative proteins with the sequence homologous to rice putative protein PROLM24 (Os06g31070), a prolamin precursor. Expression of this cDNA did not affect yeast growth in absence of salt. Axt3k and G19 strains expressing the PROLM24 were able to grow upto 400 mM and 600 mM of NaCl respectively. Similarly, Axt3k mutant with PROLM24 expression showed comparatively higher growth rate in the medium with excess LiCl (50 mM). The observation that expression of PROLM24 rescued the salt sensitive phenotypes of G19 and Axt3k indicates the existence of a regulatory system that ameliorates the effect of salt stress in the transformed yeast mutants. However, the exact function of the cDNA sequence, which shows partial sequence homology to yeast UTR1 is not clear. Although UTR1 involved in ferrous uptake and iron homeostasis in yeast cells, there is no evidence to prove its role in Na+ homeostasis in yeast cells. Absence of transmembrane regions in Os06g31070 protein indicates that salt tolerance is achieved not through the direct functional complementation of the mutant genes but through an alternative mechanism.

Analysis of DNA-Recognizing Enzyme Interaction using Deaminated Lesions

Deaminated lesions were produced via nitrosative oxidation of natural nucleobases; uracul (Ura, U) from cytosine (Cyt, C), hypoxanthine (Hyp, H) from adenine (Ade, A), and xanthine (Xan, X) and oxanine (Oxa, O) from guanine (Gua, G). Such damaged nucleobases may induce mutagenic problems, so that much attentions and efforts have been poured on the revealing of their mechanisms in vivo or in vitro. In this study, we employed these deaminated lesions as useful probes for analysis of DNA-binding/recognizing proteins or enzymes. Since the pyrimidine lesions such as Hyp, Oxa and Xan are employed as analogues of guanine, their comparative uses are informative for analyzing the role of Gua in DNA sequence in DNA-protein interaction. Several DNA oligomers containing such Hyp, Oxa or Xan substituted for Gua were designed to reveal the molecular interaction between DNA and protein. From this approach, we have got useful information to understand the molecular mechanisms of the DNA-recognizing enzymes, which have not ever been observed using conventional DNA oligomer composed of just natural nucleobases.

Pattern Recognition Techniques Applied to Biomedical Patterns

Pattern recognition is the research area of Artificial Intelligence that studies the operation and design of systems that recognize patterns in the data. Important application areas are image analysis, character recognition, fingerprint classification, speech analysis, DNA sequence identification, man and machine diagnostics, person identification and industrial inspection. The interest in improving the classification systems of data analysis is independent from the context of applications. In fact, in many studies it is often the case to have to recognize and to distinguish groups of various objects, which requires the need for valid instruments capable to perform this task. The objective of this article is to show several methodologies of Artificial Intelligence for data classification applied to biomedical patterns. In particular, this work deals with the realization of a Computer-Aided Detection system (CADe) that is able to assist the radiologist in identifying types of mammary tumor lesions. As an additional biomedical application of the classification systems, we present a study conducted on blood samples which shows how these methods may help to distinguish between carriers of Thalassemia (or Mediterranean Anaemia) and healthy subjects.

Finding Approximate Tandem Repeats with the Burrows-Wheeler Transform

Approximate tandem repeats in a genomic sequence are two or more contiguous, similar copies of a pattern of nucleotides. They are used in DNA mapping, studying molecular evolution mechanisms, forensic analysis and research in diagnosis of inherited diseases. All their functions are still investigated and not well defined, but increasing biological databases together with tools for identification of these repeats may lead to discovery of their specific role or correlation with particular features. This paper presents a new approach for finding approximate tandem repeats in a given sequence, where the similarity between consecutive repeats is measured using the Hamming distance. It is an enhancement of a method for finding exact tandem repeats in DNA sequences based on the Burrows- Wheeler transform.