An Algebra for Protein Structure Data

This paper presents an algebraic approach to optimize queries in domain-specific database management system for protein structure data. The approach involves the introduction of several protein structure specific algebraic operators to query the complex data stored in an object-oriented database system. The Protein Algebra provides an extensible set of high-level Genomic Data Types and Protein Data Types along with a comprehensive collection of appropriate genomic and protein functions. The paper also presents a query translator that converts high-level query specifications in algebra into low-level query specifications in Protein-QL, a query language designed to query protein structure data. The query transformation process uses a Protein Ontology that serves the purpose of a dictionary.




References:
[1] J. Hammer and M. Schneider, "The GenAlg project: developing a new
integrating data model, language, and tool for managing and querying
genomic information," ACM SIGMOD, vol. 33, pp. 45-50.
[2] Y. Wang, R. Sunderraman, and P. Phoungphol, "A high level
programming environment for protein structure data," 2007
International Symposium on Bioinformatics Research and Applications
(ISBRA 2007), pp. 215-226.
[3] J. Hammer and M. Schneider, "Genomics Algebra: A new, integrating
data model, language, and tool for processing and querying genomic
information," First Biennial Conference on Innovative Data Systems
Research, pp. 176-187.
[4] S. Tata, W. Lang, and J.M. Patel, "Periscope/SQ: interactive exploration
of biological sequence databases," Proceedings of the 33rd international
conference on Very large databases, VLDB ÔÇÿ07, 007, pp. 1406-1409.
[5] Y. Wang and R. Sunderraman, "PDB data curation," Engineering in
Medicine and Biology Society, 2006. EMBS '06. 28th Annual
International Conference of the IEEE, 2006, pp. 4221 - 4224.
[6] Y. Wang and R. Sunderraman, "Database management system for
protein structure data," Innovations and Advanced Techniques in
Systems, Computing Sciences and Software Engineering, pp.526-531,
2008.
[7] A.S. Sidhu, T.D. Dillon, and E. Chang, "Ontology algebra for
composition of protein data sources," IEEE 2007, pp.144-140
[8] I. Mani, Z. Hu, and W. Hu, "PRONTO: a large-scale machine-induced
protein ontology," 2nd Standards and Ontologies for Functional
Genomics Conference (SOFG 2004), UK.
[9] http://www.alphaworks.ibm.com/contentnr/semanticsfaqs