On Reversal and Transposition Medians

During the last years, the genomes of more and more species have been sequenced, providing data for phylogenetic recon- struction based on genome rearrangement measures. A main task in all phylogenetic reconstruction algorithms is to solve the median of three problem. Although this problem is NP-hard even for the sim- plest distance measures, there are exact algorithms for the breakpoint median and the reversal median that are fast enough for practical use. In this paper, this approach is extended to the transposition median as well as to the weighted reversal and transposition median. Although there is no exact polynomial algorithm known even for the pairwise distances, we will show that it is in most cases possible to solve these problems exactly within reasonable time by using a branch and bound algorithm.

Statistics of Exon Lengths in Animals, Plants, Fungi, and Protists

Eukaryotic protein-coding genes are interrupted by spliceosomal introns, which are removed from the RNA transcripts before translation into a protein. The exon-intron structures of different eukaryotic species are quite different from each other, and the evolution of such structures raises many questions. We try to address some of these questions using statistical analysis of whole genomes. We go through all the protein-coding genes in a genome and study correlations between the net length of all the exons in a gene, the number of the exons, and the average length of an exon. We also take average values of these features for each chromosome and study correlations between those averages on the chromosomal level. Our data show universal features of exon-intron structures common to animals, plants, and protists (specifically, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Cryptococcus neoformans, Homo sapiens, Mus musculus, Oryza sativa, and Plasmodium falciparum). We have verified linear correlation between the number of exons in a gene and the length of a protein coded by the gene, while the protein length increases in proportion to the number of exons. On the other hand, the average length of an exon always decreases with the number of exons. Finally, chromosome clustering based on average chromosome properties and parameters of linear regression between the number of exons in a gene and the net length of those exons demonstrates that these average chromosome properties are genome-specific features.

Computational Identification of MicroRNAs and their Targets in two Species of Evergreen Spruce Tree (Picea)

MicroRNAs (miRNAs) are small, non-coding and regulatory RNAs about 20 to 24 nucleotides long. Their conserved nature among the various organisms makes them a good source of new miRNAs discovery by comparative genomics approach. The study resulted in 21 miRNAs of 20 pre-miRNAs belonging to 16 families (miR156, 157, 158, 164, 165, 168, 169, 172, 319, 390, 393, 394, 395, 400, 472 and 861) in evergreen spruce tree (Picea). The miRNA families; miR 157, 158, 164, 165, 168, 169, 319, 390, 393, 394, 400, 472 and 861 are reported for the first time in the Picea. All 20 miRNA precursors form stable minimum free energy stem-loop structure as their orthologues form in Arabidopsis and the mature miRNA reside in the stem portion of the stem loop structure. Sixteen (16) miRNAs are from Picea glauca and five (5) belong to Picea sitchensis. Their targets consist of transcription factors, growth related, stressed related and hypothetical proteins.