Maximum Common Substructure Extraction in RNA Secondary Structures Using Clique Detection Approach
The similarity comparison of RNA secondary
structures is important in studying the functions of RNAs. In recent
years, most existing tools represent the secondary structures by
tree-based presentation and calculate the similarity by tree alignment
distance. Different to previous approaches, we propose a new method
based on maximum clique detection algorithm to extract the maximum
common structural elements in compared RNA secondary structures.
A new graph-based similarity measurement and maximum common
subgraph detection procedures for comparing purely RNA secondary
structures is introduced. Given two RNA secondary structures, the
proposed algorithm consists of a process to determine the score of the
structural similarity, followed by comparing vertices labelling, the
labelled edges and the exact degree of each vertex. The proposed
algorithm also consists of a process to extract the common structural
elements between compared secondary structures based on a proposed
maximum clique detection of the problem. This graph-based model
also can work with NC-IUB code to perform the pattern-based
searching. Therefore, it can be used to identify functional RNA motifs
from database or to extract common substructures between complex
RNA secondary structures. We have proved the performance of this
proposed algorithm by experimental results. It provides a new idea of
comparing RNA secondary structures. This tool is helpful to those
who are interested in structural bioinformatics.
[1] M. Hochsmann, B. Voss, and R. Giegerich, "Pure multiple RNA
secondary structure alignments: a progressive profile approach," IEEE
Trans. on Computational Biology and Bioinformatics, vol. 1, pp. 53-62,
2004.
[2] J. Liu, J. T. L. Wang, J. Hu, and B. Tian, "A method for aligning RNA
secondary structures and its application to RNA motif detection," BMC
Bioinformatics, vol. 6, pp. 89-109, 2005.
[3] J. Allali and M. F. Sagot, "A new distance for high level RNA secondary
structure comparison," IEEE Trans. on Computational Biology and
Bioinformatics, vol. 2, pp. 1-11, 2005.
[4] G. D. Collins, S. Le and K. Zhang, "A new algorithm for computing
similarity between RNA structures," Information Sciences, vol. 139, pp.
59-77, 2001.
[5] S. Dulucq and L. Tichit, "RNA secondary structure comparison: exact
analysis of the Zhang-Sasha tree edit-algorithm," Theoretical Computer
Science, vol. 306, pp. 471-484, 2003.
[6] J. T. L. Wang, B.A. Shapiro, D. Shasha, K. Zhang and K.M. Currey, "An
algorithm for finding the largest approximately common substructures of
two trees," IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 20, pp. 889-895, 1998.
[7] I. L. Hofacker, "Vienna RNA secondary structure server," Nucleic Acids
Research, vol. 31, pp. 3429-3431, 2003.
[8] T.J. Macke, D. J.Ecker, R.R. Gutell, D. Gautheret, D. A. Case and R.
Sampath, "RNAMotif, an RNA secondary structure definition and search
algorithm," Nucleic Acids Research, vol. 29, pp. 4724-4735, 2001.
[9] K. D. Pruitt, T. Tatusova and R. D. Maglott, "NCBI Reference Sequence
(RefSeq): a curated non-redundant sequence database of genomes,
transcripts and proteins," Nucleic Acids Research, vol. 35(Database issue),
pp. D61-65, 2007.
[10] G. Levi, "A note on the derivation of maximal common subgraphs of two
directed or undirected graphs," Calcolo, vol. 9, pp. 347-352, 1973.
[11] J. W. Raymond and P. Willett, "Maximum common subgraph
isomorphism algorithms for the matching of chemical structures," Journal
of Computer-aided Molecular Design, vol. 16, pp. 521-533, 2002.
[12] P. Pardalos and J. Xue, "The maximum clique problem," J. Global
Optimiz, vol. 4, pp. 301-328, 1994.
[13] R. Carraghan and P. M. Pardalos, "An exact algorithm for the maximum
clique problem," Operations Research Letters, vol. 9, pp.375-382, 1990.
[14] N. C. Lau, L. P. Lim, E.G. Weinstein and D. P. Bartel, "An abundant class
of tiny RNAs with probable regulatory roles in Caenorhabditis elegans,"
Science, vol. 294, pp. 858-862, 2001.
[15] L. P. Lim, N.C. Lau, E.G. Weinstein, A. Abdelhakim, S. Yekta, M. W.
Rhoades, C.B. Burge, D.P. Bartel, "The microRNAs of Caenorhabditis
elegans," Genes & Development, vol. 17, pp. 991-1008, 2003.
[16] S. Griffiths-Jones, A. Bateman, M. Marshall, A. Khanna and S. R. Eddy,
"Rfam: and RNA family database," Nucleic Acids Research, vol. 31, pp.
439-441, 2003.
[17] L. R. Otake, P. Scamborova, C. Hashimoto, J. A. Steltz, "The divergent
U12-type spliceosome is required for pre-mRNA splicing and is essential
for development in Drosophila," Molecular Cell, vol. 9, pp. 439-446,
2002.
[18] N. C. Andrews, "Disorders of iron metabolism," New England Journal of
Medicine, vol.341, pp. 1986-1995, 1999.
[19] C. A. R. Hoare, "Quicksort," Computer Journal, vol. 5, pp. 10-15, 1962.
[20] M. R. Garey and D. Johnson, "The complexity of near-optimal graph
coloring," Journal of the Association for Computing Machinery, vol. 23,
pp. 43-49, 1976.
[21] P. Turan, "On the theory of graphs," Colloq. Math, vol. 3, pp. 19-30,
1954.
[1] M. Hochsmann, B. Voss, and R. Giegerich, "Pure multiple RNA
secondary structure alignments: a progressive profile approach," IEEE
Trans. on Computational Biology and Bioinformatics, vol. 1, pp. 53-62,
2004.
[2] J. Liu, J. T. L. Wang, J. Hu, and B. Tian, "A method for aligning RNA
secondary structures and its application to RNA motif detection," BMC
Bioinformatics, vol. 6, pp. 89-109, 2005.
[3] J. Allali and M. F. Sagot, "A new distance for high level RNA secondary
structure comparison," IEEE Trans. on Computational Biology and
Bioinformatics, vol. 2, pp. 1-11, 2005.
[4] G. D. Collins, S. Le and K. Zhang, "A new algorithm for computing
similarity between RNA structures," Information Sciences, vol. 139, pp.
59-77, 2001.
[5] S. Dulucq and L. Tichit, "RNA secondary structure comparison: exact
analysis of the Zhang-Sasha tree edit-algorithm," Theoretical Computer
Science, vol. 306, pp. 471-484, 2003.
[6] J. T. L. Wang, B.A. Shapiro, D. Shasha, K. Zhang and K.M. Currey, "An
algorithm for finding the largest approximately common substructures of
two trees," IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 20, pp. 889-895, 1998.
[7] I. L. Hofacker, "Vienna RNA secondary structure server," Nucleic Acids
Research, vol. 31, pp. 3429-3431, 2003.
[8] T.J. Macke, D. J.Ecker, R.R. Gutell, D. Gautheret, D. A. Case and R.
Sampath, "RNAMotif, an RNA secondary structure definition and search
algorithm," Nucleic Acids Research, vol. 29, pp. 4724-4735, 2001.
[9] K. D. Pruitt, T. Tatusova and R. D. Maglott, "NCBI Reference Sequence
(RefSeq): a curated non-redundant sequence database of genomes,
transcripts and proteins," Nucleic Acids Research, vol. 35(Database issue),
pp. D61-65, 2007.
[10] G. Levi, "A note on the derivation of maximal common subgraphs of two
directed or undirected graphs," Calcolo, vol. 9, pp. 347-352, 1973.
[11] J. W. Raymond and P. Willett, "Maximum common subgraph
isomorphism algorithms for the matching of chemical structures," Journal
of Computer-aided Molecular Design, vol. 16, pp. 521-533, 2002.
[12] P. Pardalos and J. Xue, "The maximum clique problem," J. Global
Optimiz, vol. 4, pp. 301-328, 1994.
[13] R. Carraghan and P. M. Pardalos, "An exact algorithm for the maximum
clique problem," Operations Research Letters, vol. 9, pp.375-382, 1990.
[14] N. C. Lau, L. P. Lim, E.G. Weinstein and D. P. Bartel, "An abundant class
of tiny RNAs with probable regulatory roles in Caenorhabditis elegans,"
Science, vol. 294, pp. 858-862, 2001.
[15] L. P. Lim, N.C. Lau, E.G. Weinstein, A. Abdelhakim, S. Yekta, M. W.
Rhoades, C.B. Burge, D.P. Bartel, "The microRNAs of Caenorhabditis
elegans," Genes & Development, vol. 17, pp. 991-1008, 2003.
[16] S. Griffiths-Jones, A. Bateman, M. Marshall, A. Khanna and S. R. Eddy,
"Rfam: and RNA family database," Nucleic Acids Research, vol. 31, pp.
439-441, 2003.
[17] L. R. Otake, P. Scamborova, C. Hashimoto, J. A. Steltz, "The divergent
U12-type spliceosome is required for pre-mRNA splicing and is essential
for development in Drosophila," Molecular Cell, vol. 9, pp. 439-446,
2002.
[18] N. C. Andrews, "Disorders of iron metabolism," New England Journal of
Medicine, vol.341, pp. 1986-1995, 1999.
[19] C. A. R. Hoare, "Quicksort," Computer Journal, vol. 5, pp. 10-15, 1962.
[20] M. R. Garey and D. Johnson, "The complexity of near-optimal graph
coloring," Journal of the Association for Computing Machinery, vol. 23,
pp. 43-49, 1976.
[21] P. Turan, "On the theory of graphs," Colloq. Math, vol. 3, pp. 19-30,
1954.
@article{"International Journal of Information, Control and Computer Sciences:58459", author = "Shih-Yi Chao", title = "Maximum Common Substructure Extraction in RNA Secondary Structures Using Clique Detection Approach", abstract = "The similarity comparison of RNA secondary
structures is important in studying the functions of RNAs. In recent
years, most existing tools represent the secondary structures by
tree-based presentation and calculate the similarity by tree alignment
distance. Different to previous approaches, we propose a new method
based on maximum clique detection algorithm to extract the maximum
common structural elements in compared RNA secondary structures.
A new graph-based similarity measurement and maximum common
subgraph detection procedures for comparing purely RNA secondary
structures is introduced. Given two RNA secondary structures, the
proposed algorithm consists of a process to determine the score of the
structural similarity, followed by comparing vertices labelling, the
labelled edges and the exact degree of each vertex. The proposed
algorithm also consists of a process to extract the common structural
elements between compared secondary structures based on a proposed
maximum clique detection of the problem. This graph-based model
also can work with NC-IUB code to perform the pattern-based
searching. Therefore, it can be used to identify functional RNA motifs
from database or to extract common substructures between complex
RNA secondary structures. We have proved the performance of this
proposed algorithm by experimental results. It provides a new idea of
comparing RNA secondary structures. This tool is helpful to those
who are interested in structural bioinformatics.", keywords = "Clique detection, labeled vertices, RNA secondary
structures, subgraph, similarity.", volume = "2", number = "9", pages = "3092-10", }