A Survey of Semantic Integration Approaches in Bioinformatics

Technological advances of computer science and data
analysis are helping to provide continuously huge volumes of
biological data, which are available on the web. Such advances
involve and require powerful techniques for data integration to
extract pertinent knowledge and information for a specific question.
Biomedical exploration of these big data often requires the use
of complex queries across multiple autonomous, heterogeneous
and distributed data sources. Semantic integration is an active
area of research in several disciplines, such as databases,
information-integration, and ontology. We provide a survey of some
approaches and techniques for integrating biological data, we focus
on those developed in the ontology community.




References:
[1] N. Shadbolt, T. Berners-Lee, and W. Hall, “The semantic web revisited,”
IEEE intelligent systems, vol. 21, no. 3, pp. 96–101, 2006.
[2] A. H. Asiaee, T. Minning, P. Doshi, and R. L. Tarleton, “A framework
for ontology-based question answering with application to parasite
immunology,” Journal of biomedical semantics, vol. 6, no. 1, p. 1, 2015.
[3] G. Santipantakis, K. I. Kotis, and G. A. Vouros, “Ontology-based data
integration for event recognition in the maritime domain,” in Proceedings
of the 5th International Conference on Web Intelligence, Mining and
Semantics. ACM, 2015, p. 6.
[4] C. Jonquet, E. Dzal´e-Yeumo, E. Arnaud, and P. Larmande, “Agroportal:
a proposition for ontology-based services in the agronomic domain,” in
IN-OVIVE’15: 3`eme atelier INt´egration de sources/masses de donn´ees
h´et´erog`enes et Ontologies, dans le domaine des sciences du VIVant et
de l’Environnement, 2015.
[5] M. Iannacone, S. Bohn, G. Nakamura, J. Gerth, K. Huffer, R. Bridges,
E. Ferragut, and J. Goodall, “Developing an ontology for cyber security
knowledge graphs,” in Proceedings of the 10th Annual Cyber and
Information Security Research Conference. ACM, 2015, p. 12.
[6] H. Wache, T. Voegele, U. Visser, H. Stuckenschmidt, G. Schuster,
H. Neumann, and S. H¨ubner, “Ontology-based integration of
information-a survey of existing approaches,” in IJCAI-01 workshop:
ontologies and information sharing, vol. 2001. Citeseer, 2001, pp.
108–117.
[7] D. Dou, H. Wang, and H. Liu, “Semantic data mining: A survey of
ontology-based approaches,” in Semantic Computing (ICSC), 2015 IEEE
International Conference on. IEEE, 2015, pp. 244–251.
[8] T. R. Gruber, “A translation approach to portable ontology
specifications,” Knowledge Acquisition, 5(2), pp. 199–220, 1993.
[9] B. L. Jonathan and S. Y. Rhee, “Ontologies in biology: Design
application and future challenges.” 2004.
[10] F. Manola and E. Miller, “Rdf primer,” World Wide Web Consortium,
2004.
[11] B. Grau, I. Horrocks, and B. M. et al, “Owl 2: the next step for owl,”
Web Semant, vol. 6, pp. 309–322, 2008.
[12] A. Seaborne and E. Prud’hommeaux, “Sparql query language for rdf,”
W3C Recommendation (W3C, 2008), 2008.
[13] C. Bizer, “Evolving the web into a global data space.” in BNCOD, vol.
7051, 2011, p. 1.
[14] I. Horrocks, “Obo flat file format syntax and semantics and mapping to
owl web ontology language,” University of Manchester, 2007.
[15] J. Blake and C. Bult, “Beyond the data deluge: Data integration and
bio-ontologies.” Journal of Biomedical Informatics, pp. 314–320, 2006. [16] R. Hoehndorf, P. Schofield, and G. Gkoutos, “The role of ontologies
in biological and biomedical research: a functional perspective,” Brief.
Bioinform, 16 (6), pp. 1069–1080, 2015.
[17] M. Ashburner, C. Ball, J. Blake, D. Botstein, H. Butler, J. Cherry,
A. Davis, K.Dolinski, S. Dwight, J. Eppig, M. Harris, D. Hill,
L. Issel-Tarver, A. Kasarskis, S. Lewis, J. Matese, J. Richardson,
M. Ringwald, G. Rubin, and G. Sherlock, “Gene ontology: tool for
the unification of biology,” The Gene Ontology Consortium, vol. Nat.
Genet. 25, pp. 25–29, 2000.
[18] S. Orchard, “Molecular interaction databases,” Proteomics, vol. 12, pp.
1656–1662, 2012.
[19] K. Degtyarenko, P. de Matos, M. Ennis, J. Hastings, M. Zbinden,
A. McNaught, R. Alcantara, M. Darsow, M. Guedj, and M. Ashburner,
“Chebi: a database and ontology forchemical entities of biological
interest,” Nucleic Acids Res, pp. D344–D350, 2008.
[20] L. Montecchi-Palazzi, R. Beavis, P. Binz, R. Chalkley, J. Cottrell,
D. Creasy, J. Shofstahl, S. Seymour, and J. Garavelli, “The psi-mod
community standard for representation of protein modification data,”
Nat. Biotechnol., vol. 26, pp. 864–866, 2008.
[21] R. Brinkman, M. Courtot, D. Derom, J. Fostel, Y. He, P. Lord,
J. Malone, H. Parkinson, B. Peters, P. Rocca-Serra, A. Ruttenberg,
S.-A. Sansone, L. Soldatova, C. S. Jr., J. Turner, and J. Zheng,
“O.b.i. consortium,modeling biomedical experimental processes with
obi,” Biomed. Semant., vol. (Suppl. 1), 2010.
[22] M. Gremse, A. Chang, I. Schomburg, A. Grote, M. Scheer, C. Ebeling,
and D. Schomburg, “The brenda tissue ontology (bto): the first
all-integrating ontology of all organisms for enzyme sources,” Nucleic
Acids Res, pp. D507–D513, 2011.
[23] D. Natale, C. Arighi,W. Barker, J. Blake, C. Bult, M. Caudy, H. Drabkin,
P. D’Eustachio, A. Evsikov, H. Huang, J. Nchoutmboube, N. Roberts,
B. Smith, J. Zhang, and C. Wu, “The protein ontology: a structured
representation of protein forms and complexes,” Nucleic Acids Res,
vol. 39, pp. D539–D545, 2011.
[24] G. Gkoutos, P. Schofield, and R. Hoehndorf, “The units ontology: a tool
for integrating units of measurement in science,” Database (Oxford),
vol. 6, pp. D539–D545, 2012.
[25] E. Younesi, S. Ansaril, M. Guendel, S. Ahmadi, C. Coggins, J. Hoeng,
M. Hofmann-Apitius, and M. C. Peitsch, “Cseo - the cigarette smoke
exposure ontology,” Journal of Biomedical Semantics, 2014.
[26] E. Friederike, L. Rieswijk, C. Evelo, H. Sarimveis, P. Doganis,
G. Drakakis, B. Fadeel, B. Hardy, J. Hastings, C. Helma, N. Jeliazkova,
V. Jeliazkov, P. Kohonen, R. Grafstrom, P. Sopasakisa, G. Tsiliki, and
E. Willighagen, “Ontology, database and tools for nanomaterial safety
evaluation,” Journal of Biomedical Semantics, 2015.
[27] E. Gu´erin, G. Marquet, A. Burgun, O. Lor´eal, L. Berti-Equille, U. Leser,
and F. Moussouni, “Integrating and warehousing liver gene expression
data and related biomedical resources in gedaw,” in International
Workshop on Data Integration in the Life Sciences. Springer, 2005,
pp. 158–174.
[28] K. M. Livingston, M. Bada, W. A. Baumgartner, and L. E. Hunter,
“Kabob: ontology-based semantic integration of biomedical databases,”
BMC bioinformatics, vol. 16, no. 1, p. 1, 2015.
[29] M. Masseroli, A. Canakoglu, and S. Ceri, “Integration and querying of
genomic and proteomic semantic annotations for biomedical knowledge
extraction,” IEEE/ACM Transactions on Computational Biology and
Bioinformatics, vol. 13, no. 2, pp. 209–219, 2016.
[30] M. Dumontier, C. J. Baker, J. Baran, A. Callahan, and L. C. et al., “The
semanticscience integrated ontology (sio) for biomedical research and
knowledge discovery,” Biomed Semantics, vol. vol. 5, p. p. 14, 2014.
[31] J. Zheng, Z. Xiang, C. J. Stoeckert, and Y. Hel, “Ontodog: a web-based
ontology community view generation tool,” Bioinformatics, vol. vol. 30,
pp. pp. 1340–1342, 2014.
[32] D. Ostrowski, N. Rychtyckyj, P. MacNeille, and M. Kim, “Integration
of big data using semantic web technologies,” in 2016 IEEE Tenth
International Conference on Semantic Computing (ICSC). IEEE, 2016,
pp. 382–385.
[33] B.-H. Tran, C. Plumejeaud-Perreau, A. Bouju, and V. Bretagnolle,
“A semantic mediator for handling heterogeneity of spatio-temporal
environment data,” in Research Conference on Metadata and Semantics
Research. Springer, 2015, pp. 381–392.
[34] O. Cur´e, F. Kerdjoudj, D. Faye, C. Le Duc, and M. Lamolle, “On the
potential integration of an ontology-based data access approach in nosql
stores,” International Journal of Distributed Systems and Technologies
(IJDST), vol. 4, no. 3, pp. 17–30, 2013.
[35] O. Cur´e, R. Hecht, C. Le Duc, and M. Lamolle, “Data integration
over nosql stores using access path based mappings,” in International
Conference on Database and Expert Systems Applications. Springer,
2011, pp. 481–495.
[36] L. H. Childs, S. Mamlouk, J. Brandt, C. Sers, and U. Leser, “Sofia:
a data integration framework for annotating high-throughput datasets,”
Bioinformatics, p. btw302, 2016.
[37] J. Huang, K. Eilbeck, J. A. Blake, D. Dou, D. A. Natale, A. Ruttenberg,
B. Smith, M. T. Zimmermann, G. Jiang, Y. Lin et al., “A domain
ontology for the non-coding rna field,” in Bioinformatics and
Biomedicine (BIBM), 2015 IEEE International Conference on. IEEE,
2015, pp. 621–624.
[38] C. Jonquet, M. A.Musen, and N. H. Shah, “Building a biomedical
ontology recommender web service,” Biomed Semantics, pp. 1–18, 2010.
[39] J. Malone, R. Stevens, S. Jupp, T. Hancocks, H. Parkinson, and
C. Brooksbank, “Ten simple rules for selecting a bio-ontology,” PLOS
Comput Biol, vol. vol. 30, pp. 12(2), e1 004 743, 2016.
[40] E. Gu´erin, F. Moussouni, B. Courselaud, and O. Lor´eal, “Mod´elisation
d’un entrepˆot de donnes d´edi´e `a l’analyse du transcriptome h´epatique,”
Actes des Journ´ees Ouvertes Biologie Informatique Math´ematiques
(JOBIM), vol. vol. 30, pp. pp 319–324, 2008.
[41] W. Bensz, D. Borys, K. Fujarewicz, K. Herok, R. Jaksik, M. Krasucki,
A. Kurczyk, K. Matusik, D. Mrozek, M. Ochab et al., “Integrated
system supporting research on environment related cancers,” in
Recent Developments in Intelligent Information and Database Systems.
Springer, 2016, pp. 399–409.
[42] C. Goble and R. Stevens, “State of the nation in data integration for
bioinformatics,” Journal of biomedical informatics, vol. 41, no. 5, pp.
687–693, 2008.
[43] A. Kasprzyk, “Biomart: driving a paradigm change in biological data
management,” Database, vol. 2011, p. bar049, 2011.
[44] S. Trißl, K. Rother, H. M¨uller, T. Steinke, I. Koch, R. Preissner,
C. Fr¨ommel, and U. Leser, “Columba: an integrated database of proteins,
structures, and annotations,” BMC bioinformatics, vol. 6, no. 1, p. 1,
2005.
[45] C. M. Machado, D. Rebholz-Schuhmann, A. T. Freitas, and F. M. Couto,
“The semantic web in translational medicine: current applications and
future directions,” Briefings in bioinformatics, vol. 16, no. 1, pp. 89–103,
2015.
[46] S. Bechhofer, I. Buchan, D. De Roure, P. Missier, J. Ainsworth,
J. Bhagat, P. Couch, D. Cruickshank, M. Delderfield, I. Dunlop et al.,
“Why linked data is not enough for scientists,” Future Generation
Computer Systems, vol. 29, no. 2, pp. 599–611, 2013.
[47] T. J¨org and S. Deßloch, “Towards generating etl processes for
incremental loading,” in Proceedings of the 2008 international
symposium on Database engineering & applications. ACM, 2008,
pp. 101–110.
[48] T. J. Lee, Y. Pouliot, V. Wagner, P. Gupta, D. W. Stringer-Calvert, J. D.
Tenenbaum, and P. D. Karp, “Biowarehouse: a bioinformatics database
warehouse toolkit,” BMC bioinformatics, vol. 7, no. 1, p. 1, 2006.
[49] W. McLaren, B. Pritchard, D. Rios, Y. Chen, P. Flicek, and
F. Cunningham, “Deriving the consequences of genomic variants with
the ensembl api and snp effect predictor,” Bioinformatics, vol. 26, no. 16,
pp. 2069–2070, 2010.
[50] B. Giardine, C. Riemer, R. C. Hardison, R. Burhans, L. Elnitski, P. Shah,
Y. Zhang, D. Blankenberg, I. Albert, J. Taylor et al., “Galaxy: a platform
for interactive large-scale genome analysis,” Genome research, vol. 15,
no. 10, pp. 1451–1455, 2005.
[51] K. Wolstencroft, R. Haines, D. Fellows, A. Williams, D. Withers,
S. Owen, S. Soiland-Reyes, I. Dunlop, A. Nenadic, P. Fisher et al.,
“The taverna workflow suite: designing and executing workflows of web
services on the desktop, web or in the cloud,” Nucleic acids research,
p. gkt328, 2013.