PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts

Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising.




References:
[1] D. Eisenberg, "DIP - Database of interacting Proteins," University of
California, http://dip.doe-mbi.ucla.edu. 1999.
[2] "BOND - Biomolecular Object network databank," Thomson Scientific,
http://www.bind.ca. 1999.
[3] T. Igarashi, and H. Kaminuma, "CSNDB - Cell Signaling Networks
Database," National Institute of Health Sciences, Japan,
http://geo.nihs.go.jp/csndb. 1998.
[4] H. Higashi-ku, and Fukuoka, "Signaling PAthway Database (SPAD),"
Kyushu University, http: //www.grt.Kyushu-u.ac.jp/ eny-doc. 1998.
[5] "MEDLINE - National Library of Medicine (NLM)," National
Institutes of Health (NIH), http://www.nlm.nih.gov. 1993.
[6] "PubMed Centeral," (NCBI), http://www.ncbi.nlm.nih.gov /sites/entrez/.
1988.
[7] K. Fukuda, T. Tsunoda, A. Tamura, and T. Takagi., "Toward Information
Extraction: Identifying protein names from biological papers," Proc. Pacific
Symp. Biocomputing, pp. 707-718, 1998.
[8] C. Blaschke, M. Andrade, C. Ouzounis, and A. Valencia, "Automatic extraction of
biological information from scientific Text: Protein-Protein interactions," Proc.
AAAI Conf. Intelligence sys. in Molecular biology, pp. 60-67, 1999.
[9] T. Sekimizu, H.S. Park, and J. Tsujii, "Identifying the Interaction
between Genes and gens Products based on Frequently Seen Verbs in
MEDLINE Abstracts," Genome inform Ser Workshop Genome inform.,
pp. 62-71, 1998.
[10] N.S. Kiong, M. Wong, "Toward Routine Automatic pathway Discovery
from on-line scientific text Abstractsd," Proc. Tenth Inter. Workshop
Genome inform., pp. 104-112, 1999.
[11] A. Clegg, and A. Shepherd "Benchmarking Natural-Language Parsers
for biological Applications using dependency Graphs," J. BMC
Bioinformatics, vol.8- pp. 24, Jan 2007.
[12] J. Thomas, "D. Milward, C.A. Ouzounis, S. Pulman, and M. Caroll,
"Automatic Extraction of Protein Interactions from Scientific Abstracts",
Pacific Symp. Biocomputing, pp. 541-552, 2000.
[13] L. Gondy, C. Hsinchun, and D. Jesse, "A Shallow Parser Based on
Closed-Class Words to Capture Relations in Biomedical Text," J.
Biomedical Informatics, vol.36, pp. 145-158, August 2004.
[14] G. Claudio, L. Alberto, and Lorenza Romano, "Exploiting Shallow
Linguistic Information for Relation Extraction from Biomedical
Literature," Proc. 11th Conf. the European Chapter of the Association
for Computational Linguistics (EACL 2006), 2006.
[15] C. Friedman, "MedLEE - A Medical Language Extraction and Encoding
System," Columbia University, and Queens College of CUNY,
http://lucid.cpmc.columbia.edu/medlee. 1995.
[16] C. Friedman, P. Kra, H. Yu, M. Krauthammer, and A. Rzhetsky,
"GENIES: A Natural-Language Processing System for the Extraction of
Molecular Pathways from Journal Articles," J. Bioinformatics, vol. 17,
pp. 74-82(9), June 2001.
[17] C. Friedman, "MedScan - A Medical Language Extraction and Encoding
System," Columbia University, and Queens College of CUNY,
http://www.ariadnegenomics.com/products/medscan. 1995.
[18] A. Rzhetsky, "Geneways: A search engine and information extraction
tool for biological research," Columbia Genome Center,
http://geneways.genomecenter.columbia.edu. 2005.
[19] D. Corney, D. Jones and B. Buxton, "BioRAT System," Columbia
Genome Center, http://bioinf.cs.ucl. Ac.uk/biorat. 2005.
[20] J. Xiao, J. Su, G. Zhou and C. Tan, "Protein-Protein Interaction
Extraction: A Supervised Learning Approach," Proc. first Inter. Symp.
Semantic mining in Biomedicine (SMBM 2005), pp. 51-59, 2005.
[21] J. Ding, D. Berleant, J. Xu, and A.W. Fulmer, "Extracting Biochemical
Interactions from MEDLINE Using a Link Grammar Parser," Proc. 15th
IEEE Inter. Conf. Tools with Artificial Intelligence (ICTAI-03), pp. 467-
471, 2003.
[22] Y.C. Lin, C.L. Peng, C.Y. Kao, H.F. Juan,H. C. Huang, "ProtExt: A
system for protein-protein interactionextraction from PubMed abstracts"
, Proc. 12th Inter. Conf. Intelligent Systems for Molecular Biology
(ISMB) and Conf. Computational Biology (ECCB), 2005.
[23] S.T. Ahmed, D. Chidambaram, H. Davulcu, and C. Baral, "IntEx: A
Syntactic Role Driven Protein-Protein Interaction Extractor for Bio-
Medical Text," Proc. ACL-ISMB workshop linking biological literature,
ontologies and databases: Mining biological semantics, pp. 54-61, 2005.
[24] Z. Yang, H. Lin, and B. Wu, "BioPPIExtractor: A Protein-Protein
Interaction Extraction System for PubMed Abstracts," J. Expert Systems
with Applications, Article in press, doi: 10.1016 /j.eswa.2007.12.014. 23
Dec. 2007.
[25] "LocusLink - Database of genes," (NCBI),
http://www.ncbi.nlm.nih.gov/sites/ entrez?db=gene. 1988.
[26] "Universal Protein Resource (UniProt)," European Bioinformatics
Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the
Protein Information Resource (PIR), http://beta.uniprot.org. 2002.
[27] "ExPASy Proteomics Server," Swiss Institute of Bioinformatics (SIB),
http://www.expasy.ch. 2003.
[28] R. Hoffmann and A. Valencia, "A Gene Network for Navigating the
Literature - iHOP," Nature Genetics, http://www.ihop-net.org. 2004.
[29] D. Temperley, D. Sleator, and J. Lafferty, "Link Grammar," Carnegie
Mellon University, http://www.link.cs.cmu.Edu/link. 1998.
[30] D. Sleator, and D. Temperley, "Parsing English with a Link Grammar,"
Third International Workshop on Parsing Technologies, pp. 277-292,
1993.
[31] D. Grinberg, J. Lafferty, and D. Sleator, "A Robust Parsing Algorithm
for Link Grammars," Proc. second inter. colloquium on grammatical
inference and applications, vol. 862, pp. 78-92,1995.
[32] D. Temperley, D. Sleator, and J. Lafferty, "Abiword- word processor for
everyone," Carnegie Mellon University, http://www.abisource.com.
1998.
[33] D. Brian "Lingua::LinkParser- Perl module implementing the Link
Grammar Parser," Carnegie Mellon University,
http://search.cpan.org/~dbrian/Lingua- LinkParser 1.08. 2004.
[34] "CPAN - Comprehensive Perl Archive Network," http:// www.cpan.org.
1995
[35] D. Temperley, D. Sleator, and J. Lafferty, "The parser Application
Program Interface (API)," Carnegie Mellon University,
http://www.abisource.com/projects/link-grammar/api/index.html. 1998.
[36] S. Pyysalo, T. Salakoski, S. Aubin and A. Nazarenko, "Lexical
Adaptation of Link Grammar to the Biomedical Sublanguage: a
Comparative Evaluation of Three Approaches," J. BMC Bioinformatics,
vol. 7, pp. 60-67, November 2006.
[37] E. Turner, "The LinkGrammar-WN," http://www.eturner.net/
linkgrammar-wn.2007
[38] "WordNet-a lexical database for the English language,"
Princeton University, http://wordnet.princeton.edu. 2006
[39] P. Szolovits, "Adding a Medical Lexicon to an English Parser," Proc.
AMIA 2003 Annual Symposium. pp. 639-643 ,2003
[40] "UMLS'-Unified Medical Language System," U.S. National Library of
Medicine, http://umlsinfo.nlm.nih.gov.1999.
[41] S. Pyysalo, F. Ginter, T. Pahikkala, J. Boberg, J. J¨arvinen, and T.
Salakoski, "Evaluation of Two Dependency Parsers on Biomedical
Corpus Targeted at ProteinÔÇöProtein interactions," J. Inter. Medical
Informatics, Vol. 75, Issue 6, pp. 430-442, June 2005.
[42] V. Harsha, Madhyastha, N. Balakrishnan, K.R. Ramakrishnan "Event
Information Extraction Using Link Grammar," Inter. Workshop
Research Issues in Data Eng.: Multi-lingual Information Management
(RIDE'03), pp. 16- 22, 2003.
[43] L. Hirschman, J.C. Park, J. Tsujii, L. Wong, and C.H. Wu5,
"Accomplishments and Challenges in Literature Data Mining for
Biology." J. Bioinformatics, vol. 18, pp. 1553-1561, June 2002.