Using Spectral Vectors and M-Tree for Graph Clustering and Searching in Graph Databases of Protein Structures

In this paper, we represent protein structure by using graph. A protein structure database will become a graph database. Each graph is represented by a spectral vector. We use Jacobi rotation algorithm to calculate the eigenvalues of the normalized Laplacian representation of adjacency matrix of graph. To measure the similarity between two graphs, we calculate the Euclidean distance between two graph spectral vectors. To cluster the graphs, we use M-tree with the Euclidean distance to cluster spectral vectors. Besides, M-tree can be used for graph searching in graph database. Our proposal method was tested with graph database of 100 graphs representing 100 protein structures downloaded from Protein Data Bank (PDB) and we compare the result with the SCOP hierarchical structure.




References:
[1] Brew C, Schulte im Walde (2002). Spectral Clustering for German
Verbs, In Proc of the Conf in Natural Language Processing, PA, USA,
pp 117-124
[2] Chris Godsil (2006). Graph Spectra and Graph Isomorphism. Aveiro
Workshop on Graph Spectra, University of Aveiro, Mathematics
Department, April 2006.
[3] David McWherter, William C. Regli (2001). An Approach to Indexing
Database of Solid Models. Technical Report DU-MCS-01-02.
[4] D. Shasha, J. T. L. Wang, and R. Giugno (2002). Algorithmics and
Applications of Tree and Graph Searching. In Proc. PODS'02
Proceeding of the International Conference in Pattern recognition
(ICPR), Quebec, Canada, August 2002.
[5] Do Phuc, Hoang Kiem (2006) . Using M-tree for similarity search in
biological sequence databases, Journal of Bio-Technology, VAST, ISBN
1811-4899, pp151-158
[6] Huan, J., Wang, W., Washington, A., Prins, J., Shah, R., Tropsha, A.
(2004). Accurate classification of proteins tructural families based on
coherent subgraph analysis. In Proc. Pacific Symposium on
Biocomputing 9:411-422.
[7] Huahai He, Ambuj K. Singh (2006). Closure-Tree: An Index Structure
for Graph Queries, ICDE.
[8] Jun Huan, Deepak Bandyopadhyay, Wei Wang, Jack Snoeyink, Jan
Prins, Alexander Tropsha (2005). Comparing Graph Representations of
Protein Structure for Mining Family-Specific Residue-Based Packing
Motifs. Journal of Computational Biology. July 2005, 12(6): 657-671.
[9] James Cheng, Yiping Ke, Wilfred Ng, An Lu (2007). FG-Index:
Towards Verification-Free Query Processing on Graph Databases,
SIGMOD, 2007.
[10] Mitchell Peabody (2002). Finding Groups of Graphs in Databases.
Master thesis in Computer Science, Drexel University, August, 2002.
[11] Murzin, A.G, Brenner, S., Hubbard, T. & Chothia., C. (1995). SCOP: a
structural classification of proteins database for the investigation of
sequences and structures. Journal of MolecularBiology, 247, 536-40.
[12] Norman Biggs. Algebraic Graph Theory. Cambridge University Press,
2nd edition, 1993.
[13] Paolo C., Marco P., Pavel Z (1997): M-tree: An efficient Access Method
for Similarity Search In Metric Spaces, VLDB 1997, pp:426-435.
[14] Patella M (1998). Similarity search in multimedia database, PhD
dissertation, University of Bolgona, Italia.
[15] Peixiang Zhao, Jeffrey Xu Yu, Philip S. Yu (2007). Graph Indexing:
Tree + Delta >= Tree, VLDB, 2007
[16] Steffen Lang (2007). Protein domain decomposition using spectral graph
partitioning.
[17] Saraswathi Vishveshwara et al (2002). Protein structure insights from
graph theory, Journal of Theoretical and Computational Chemistry, Vol
1, No 1.
[18] X. Yan, J. Han (2002). gSpan: Graph-based substructure pattern mining,
ICDM, 2002
[19] Xiafeng Yan, Philip S. Yu, Jiawei Han (2004). Graph indexing: A
Frequent Structure-based Approach, SIGMOD 2004
[20] Yun Chi, Yirong Yang, Richard Muntz, Mining Frequent Rooted Trees
and Free Trees Using Canonical Forms, UCLA Technical Report, 2003