Maximum Common Substructure Extraction in RNA Secondary Structures Using Clique Detection Approach

The similarity comparison of RNA secondary structures is important in studying the functions of RNAs. In recent years, most existing tools represent the secondary structures by tree-based presentation and calculate the similarity by tree alignment distance. Different to previous approaches, we propose a new method based on maximum clique detection algorithm to extract the maximum common structural elements in compared RNA secondary structures. A new graph-based similarity measurement and maximum common subgraph detection procedures for comparing purely RNA secondary structures is introduced. Given two RNA secondary structures, the proposed algorithm consists of a process to determine the score of the structural similarity, followed by comparing vertices labelling, the labelled edges and the exact degree of each vertex. The proposed algorithm also consists of a process to extract the common structural elements between compared secondary structures based on a proposed maximum clique detection of the problem. This graph-based model also can work with NC-IUB code to perform the pattern-based searching. Therefore, it can be used to identify functional RNA motifs from database or to extract common substructures between complex RNA secondary structures. We have proved the performance of this proposed algorithm by experimental results. It provides a new idea of comparing RNA secondary structures. This tool is helpful to those who are interested in structural bioinformatics.

Authors:



References:
[1] M. Hochsmann, B. Voss, and R. Giegerich, "Pure multiple RNA
secondary structure alignments: a progressive profile approach," IEEE
Trans. on Computational Biology and Bioinformatics, vol. 1, pp. 53-62,
2004.
[2] J. Liu, J. T. L. Wang, J. Hu, and B. Tian, "A method for aligning RNA
secondary structures and its application to RNA motif detection," BMC
Bioinformatics, vol. 6, pp. 89-109, 2005.
[3] J. Allali and M. F. Sagot, "A new distance for high level RNA secondary
structure comparison," IEEE Trans. on Computational Biology and
Bioinformatics, vol. 2, pp. 1-11, 2005.
[4] G. D. Collins, S. Le and K. Zhang, "A new algorithm for computing
similarity between RNA structures," Information Sciences, vol. 139, pp.
59-77, 2001.
[5] S. Dulucq and L. Tichit, "RNA secondary structure comparison: exact
analysis of the Zhang-Sasha tree edit-algorithm," Theoretical Computer
Science, vol. 306, pp. 471-484, 2003.
[6] J. T. L. Wang, B.A. Shapiro, D. Shasha, K. Zhang and K.M. Currey, "An
algorithm for finding the largest approximately common substructures of
two trees," IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 20, pp. 889-895, 1998.
[7] I. L. Hofacker, "Vienna RNA secondary structure server," Nucleic Acids
Research, vol. 31, pp. 3429-3431, 2003.
[8] T.J. Macke, D. J.Ecker, R.R. Gutell, D. Gautheret, D. A. Case and R.
Sampath, "RNAMotif, an RNA secondary structure definition and search
algorithm," Nucleic Acids Research, vol. 29, pp. 4724-4735, 2001.
[9] K. D. Pruitt, T. Tatusova and R. D. Maglott, "NCBI Reference Sequence
(RefSeq): a curated non-redundant sequence database of genomes,
transcripts and proteins," Nucleic Acids Research, vol. 35(Database issue),
pp. D61-65, 2007.
[10] G. Levi, "A note on the derivation of maximal common subgraphs of two
directed or undirected graphs," Calcolo, vol. 9, pp. 347-352, 1973.
[11] J. W. Raymond and P. Willett, "Maximum common subgraph
isomorphism algorithms for the matching of chemical structures," Journal
of Computer-aided Molecular Design, vol. 16, pp. 521-533, 2002.
[12] P. Pardalos and J. Xue, "The maximum clique problem," J. Global
Optimiz, vol. 4, pp. 301-328, 1994.
[13] R. Carraghan and P. M. Pardalos, "An exact algorithm for the maximum
clique problem," Operations Research Letters, vol. 9, pp.375-382, 1990.
[14] N. C. Lau, L. P. Lim, E.G. Weinstein and D. P. Bartel, "An abundant class
of tiny RNAs with probable regulatory roles in Caenorhabditis elegans,"
Science, vol. 294, pp. 858-862, 2001.
[15] L. P. Lim, N.C. Lau, E.G. Weinstein, A. Abdelhakim, S. Yekta, M. W.
Rhoades, C.B. Burge, D.P. Bartel, "The microRNAs of Caenorhabditis
elegans," Genes & Development, vol. 17, pp. 991-1008, 2003.
[16] S. Griffiths-Jones, A. Bateman, M. Marshall, A. Khanna and S. R. Eddy,
"Rfam: and RNA family database," Nucleic Acids Research, vol. 31, pp.
439-441, 2003.
[17] L. R. Otake, P. Scamborova, C. Hashimoto, J. A. Steltz, "The divergent
U12-type spliceosome is required for pre-mRNA splicing and is essential
for development in Drosophila," Molecular Cell, vol. 9, pp. 439-446,
2002.
[18] N. C. Andrews, "Disorders of iron metabolism," New England Journal of
Medicine, vol.341, pp. 1986-1995, 1999.
[19] C. A. R. Hoare, "Quicksort," Computer Journal, vol. 5, pp. 10-15, 1962.
[20] M. R. Garey and D. Johnson, "The complexity of near-optimal graph
coloring," Journal of the Association for Computing Machinery, vol. 23,
pp. 43-49, 1976.
[21] P. Turan, "On the theory of graphs," Colloq. Math, vol. 3, pp. 19-30,
1954.