CompPSA: A Component-Based Pairwise RNA Secondary Structure Alignment Algorithm

The biological function of an RNA molecule depends
on its structure. The objective of the alignment is finding the
homology between two or more RNA secondary structures. Knowing
the common functionalities between two RNA structures allows
a better understanding and a discovery of other relationships
between them. Besides, identifying non-coding RNAs -that is not
translated into a protein- is a popular application in which RNA
structural alignment is the first step A few methods for RNA
structure-to-structure alignment have been developed. Most of these
methods are partial structure-to-structure, sequence-to-structure, or
structure-to-sequence alignment. Less attention is given in the
literature to the use of efficient RNA structure representation and the
structure-to-structure alignment methods are lacking. In this paper,
we introduce an O(N2) Component-based Pairwise RNA Structure
Alignment (CompPSA) algorithm, where structures are given as
a component-based representation and where N is the maximum
number of components in the two structures. The proposed algorithm
compares the two RNA secondary structures based on their weighted
component features rather than on their base-pair details. Extensive
experiments are conducted illustrating the efficiency of the CompPSA
algorithm when compared to other approaches and on different real
and simulated datasets. The CompPSA algorithm shows an accurate
similarity measure between components. The algorithm gives the
flexibility for the user to align the two RNA structures based on
their weighted features (position, full length, and/or stem length).
Moreover, the algorithm proves scalability and efficiency in time and
memory performance.




References:
[1] D. Gusfield, Algorithms on strings, trees and sequences: computer
science and computational biology. Cambridge University Press, 1997.
[2] D. Sankoff, “Simultaneous solution of the rna folding, alignment
and protosequence problems,” SIAM Journal on Applied Mathematics,
vol. 45, no. 5, pp. 810–825, 1985. [3] S. B. Needleman and C. D. Wunsch, “A general method applicable to
the search for similarities in the amino acid sequence of two proteins,”
Journal of molecular biology, vol. 48, no. 3, pp. 443–453, 1970.
[4] T. F. Smith and M. S. Waterman, “Identification of common molecular
subsequences,” Journal of molecular biology, vol. 147, no. 1, pp.
195–197, 1981.
[5] J. Gorodkin, L. J. Heyer, and G. D. Stormo, “Finding the most significant
common sequence and structure motifs in a set of rna sequences,”
Nucleic Acids Research, vol. 25, no. 18, pp. 3724–3732, 1997.
[6] J. H. Havgaard, R. B. Lyngsø, G. D. Stormo, and J. Gorodkin, “Pairwise
local structural alignment of rna sequences with sequence similarity less
than 40%,” Bioinformatics, vol. 21, no. 9, pp. 1815–1824, 2005.
[7] J. H. Havgaard, R. B. Lyngsø, and J. Gorodkin, “The foldalign web
server for pairwise structural rna alignment and mutual motif search,”
Nucleic acids research, vol. 33, no. suppl 2, pp. W650–W653, 2005.
[8] J. H. Havgaard, E. Torarinsson, and J. Gorodkin, “Fast pairwise
structural rna alignments by pruning of the dynamical programming
matrix,” PLOS computational biology, vol. 3, no. 10, p. e193, 2007.
[9] E. Torarinsson, J. H. Havgaard, and J. Gorodkin, “Multiple structural
alignment and clustering of rna sequences,” Bioinformatics, vol. 23,
no. 8, pp. 926–932, 2007.
[10] M. Bauer, G. W. Klau, and K. Reinert, “Accurate multiple
sequence-structure alignment of rna sequences using combinatorial
optimization,” BMC bioinformatics, vol. 8, no. 1, p. 271, 2007.
[11] S. Heyne, S. Will, M. Beckstette, and R. Backofen, “Lightweight
comparison of rnas based on exact sequence-structure matches,”
Bioinformatics, p. btp065, 2009.
[12] Y. Jiang, W. Xu, L. P. Thompson, R. R. Gutell, and D. P. Miranker,
“R-pass: A fast structure-based rna sequence alignment algorithm,”
in Bioinformatics and Biomedicine (BIBM), 2011 IEEE International
Conference on. IEEE, 2011, pp. 618–622.
[13] Y. Tabei, K. Tsuda, T. Kin, and K. Asai, “Scarna: fast and accurate
structural alignment of rna sequences by matching fixed-length stem
fragments,” Bioinformatics, vol. 22, no. 14, pp. 1723–1729, 2006.
[14] T. K. Wong, K.-L. Wan, B.-Y. Hsu, B. W. Cheung, W.-K. Hon, T.-W.
Lam, and S.-M. Yiu, “Rnasalign: Rna structural alignment system,”
Bioinformatics, vol. 27, no. 15, pp. 2151–2152, 2011.
[15] M. Hochsmann, T. Toller, R. Giegerich, and S. Kurtz, “Local similarity
in rna secondary structures,” in Bioinformatics Conference, 2003. CSB
2003. Proceedings of the 2003 IEEE. IEEE, 2003, pp. 159–168.
[16] M.-Y. Wu, C.-B. Yang, and K.-S. Huang, “Rna secondary structure
alignment based on stem representation,” in Proceedings of the 21st
Workshop on Combinational Mathematics and Computation Theory.
Citeseer, pp. 60–69.
[17] J. Liu, J. T. Wang, J. Hu, and B. Tian, “A method for aligning rna
secondary structures and its application to rna motif detection,” BMC
bioinformatics, vol. 6, no. 1, p. 89, 2005.
[18] C. Zhong and S. Zhang, “Efficient alignment of rna secondary structures
using sparse dynamic programming,” BMC bioinformatics, vol. 14, no. 1,
p. 269, 2013.
[19] G. Badr and M. Turcotte, “Component-based matching for multiple
interacting rna sequences,” in Bioinformatics Research and Applications.
Springer, 2011, pp. 73–86.
[20] H. Arraqibah and G. Badr, “Extended component-based motif
localization for interacting rna structures,” in ISBRA 2013, Charlotte,
US, May 20 2013.
[21] B. Hunter, “Visualization of secondary rna structure prediction
algorithms,” 2006.
[22] I. L. Hofacker, W. Fontana, P. F. Stadler, L. S. Bonhoeffer, M. Tacker,
and P. Schuster, “Fast folding and comparison of rna secondary
structures,” Monatshefte f¨ur Chemie/Chemical Monthly, vol. 125, no. 2,
pp. 167–188, 1994.
[23] S. W. Burge, J. Daub, R. Eberhardt, J. Tate, L. Barquist, E. P. Nawrocki,
S. R. Eddy, P. P. Gardner, and A. Bateman, “Rfam 11.0: 10 years of rna
families,” Nucleic acids research, p. gks1005, 2012.
[24] D. A. Benson, M. Cavanaugh, K. Clark, I. Karsch-Mizrachi, D. J.
Lipman, J. Ostell, and E. W. Sayers, “Genbank,” Nucleic acids research,
p. gks1195, 2012.
[25] A. Alturki, “Component based pair-wise rna secondary structure
alignment algorithm,” Master’s thesis, King Saud University, 2014.
[26] T. M. Lowe and S. R. Eddy, “trnascan-se: a program for improved
detection of transfer rna genes in genomic sequence,” Nucleic acids
research, vol. 25, no. 5, pp. 0955–964, 1997.