A System to Integrate and Manipulate Protein Database Using BioPerl and XML

The size, complexity and number of databases used for protein information have caused bioinformatics to lag behind in adapting to the need to handle this distributed information. Integrating all the information from different databases into one database is a challenging problem. Our main research is to develop a tool which can be used to access and manipulate protein information from difference databases. In our approach, we have integrated difference databases such as Swiss-prot, PDB, Interpro, and EMBL and transformed these databases in flat file format into relational form using XML and Bioperl. As a result, we showed this tool can search different sizes of protein information stored in relational database and the result can be retrieved faster compared to flat file database. A web based user interface is provided to allow user to access or search for protein information in the local database.




References:
[1] Guochun Xie,Reynold DeMarco,Richard Blevins and Yuhong Wang,
Stroing biological sequence databases in relational form,
http://www.bioinformatic.oupjournals.org, 1999.
[2] Andre Bergholz,Jorg A. schenk, stepehn Heyman,Johann Christoper ,
Sequence comparison using a relational database
approach,http://www.citeseer.ist.psu.edi/bergholz97sequence.html,
1997.
[3] P.mork,A.halevy, P.tarczy, A model for data integration system of
Biomedical Data Applied to Online Genetic Databases, 2000.
[4] Wang L., Riethiven-Tom, P., N,McNail P.,Robinso Redaschi,
A.,Lijnzaad,Exploiting XML with CORBA to improve Distributing
EMBL data, EMBL Outstation , European Bioinformatics Institute,2001
[5] Wang L., Riethiven-Tom, P., N,McNail P.,Robinso, Accessing and
distributing EMBL data using CORBA, Genome Biology 2000 1(5):
research, 2000
[6] E.V. Kriventseva, W.Flieschman, E.M Zdobnov, R. Apweiler, CluSTr: A
database of clusters of Swis-sprot + Trembl Protiens, Nucleic Asids
Research, Vol 29, No1, pg 33 - 36, 2001
[7] Emmanuel, B,Leser,U. Lijnzaad,P,Cussat-Blanc,Jungferm K.Guyon,F.,
Vaysseix, G, Jhelgesen,C., and Rodriguez-Tome, P. A Proposal for a
standard CORBA interface for genome Maps, Bioinformatics, vol 15, No
2, , pg 157 - 169, 1999
[8] http://www.w3.org/XML/
[9] http://www.bio.perl.org/
[10] http://www.ebi.uniprot.org/uniprot-srv/uniprotsearch
[11] http://au.expasy.org/
[12] http://pir.georgetown.edu/pirwww/dbinfo/pirpsd.html
[13] http://pfam-wust1.edu/hmmsearch.shtml
[14] http://umber.sbs.man.ac.uk/dbrowser/OWL
[15] S.F. Altschul et al., "Basic Local Alignment Search Tool,", Journal of
Molecular Biology 215, 403-420, 1990