Use of Bayesian Network in Information Extraction from Unstructured Data Sources

This paper applies Bayesian Networks to support information extraction from unstructured, ungrammatical, and incoherent data sources for semantic annotation. A tool has been developed that combines ontologies, machine learning, and information extraction and probabilistic reasoning techniques to support the extraction process. Data acquisition is performed with the aid of knowledge specified in the form of ontology. Due to the variable size of information available on different data sources, it is often the case that the extracted data contains missing values for certain variables of interest. It is desirable in such situations to predict the missing values. The methodology, presented in this paper, first learns a Bayesian network from the training data and then uses it to predict missing data and to resolve conflicts. Experiments have been conducted to analyze the performance of the presented methodology. The results look promising as the methodology achieves high degree of precision and recall for information extraction and reasonably good accuracy for predicting missing values.




References:
[1] Antoniou, G., Harmelen, F.V.: A Semantic Web Primer. 2nd Edition.
MIT Press (2004)
[2] Buneman P., Semistructured data. In Proceedings of the sixteenth ACM
SIGACT-SIGMOD-SIGART symposium on Principles of database
systems. Arizona, United States (1997) 117-121
[3] Buntine, W. L. (1996), "A Guide to the Literature on Learning
Probabilistic Networks from Data," IEEE Transactions on Knowledge
and Data Engineering, 8, pp. 195-210
[4] Chariak, E. (1991), "Bayesian Network without Tears," AI Magzine,
Winter
[5] Cooper, G. F. and E. Herskovits (1992). "A Bayesian Method for the
Induction of Probabilistic Networks from Data." Machine Learning 9:
309-347.
[6] Embley D.W., Ding Y., Liddle S. W., and Vickers M.: Automatic
Creation And Simplified Querying Of Semantic Web Content. In
Proceedings of First Asian Semantic Conference (ASWC), Beijing
China (2006)
[7] Embley, D.W., Campbell, D.M., Jiang, Y.S., Liddle, S.W., Lonsdale,
D.W., Ng, Y.k., Smith, R.D.: Conceptual-Model-Based Data Extraction
from Multiple-Record Web Pages. Journal of Data and Knowledge
Engineering, Vol.31(3), (1999) 227-251
[8] Embley, D.W., Tao, C., Liddle, S.W.: Automating the Extraction of Data
from HTML Tables with Unknown Structure. Journal of Data &
knowledge Engineering. Vol. 54(1), (2005) 3-28
[9] Gruber, T. R. A Translation Approach to Portable Ontology
Specifications. 1993
[10] Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM Semi-automatic
CREAtion of Metadata. In Proceedings of 13th International Conference
on Knowledge Engineering and Knowledge Management (EKAW),
Siguenza Spain (2002)
[11] Hrycej, T. (1990), "Gibbs Sampling in Bayesian Networks," Artificial
Intelligence, pg. 351-363
[12] Jensen, F. V. (2001), Bayesian Networks and Decision Graphs,
Springer-Veralg.
[13] Laender, A.H.F., Ribeiro-Neto, B.A., da Silva A.S., Teixeira J.S.: A
Brief Survey of Web Data Extraction Tools. In ACM SIGMOD Record,
Vol. 31(2) (2002) 84-93
[14] Laskey K.B., Myers J.W., DeJong K. A., Learning Bayesian Network
from Incomplete data using Evolutionary Algorithms. In Proceedings of
the Fifteenth Conference on Uncertainty in Artificial Intelligence.
George Mason University (1999).
[15] Lauritzen, S. L. (1995), "The EM Algorithm for Graphical Association
Models with Missing Data", Computational Statistics and Data
Analysis, 19, pp. 191-201.
[16] Neapolitan, R. E. (2003), Learning Bayesian Networks, Prentice Hall,
2003.
[17] Partee, Barbara H., Alice ter Meulen and Robert E. Wall (1990).
Mathematical methods in linguistics. Dordrecht, The Netherlands:
Kluwer Academic Publishers.
[18] Pearl, J. (1987), Probabilistic Reasoning in Intelligent Systems: Network
of Plausible Inference, Morgan Kaufmann, 1987.
[19] Peter and Mika, Social Networks and the Semantic Web Series: Semantic
Web and Beyond , Vol. 5 (2007)
[20] Pretorius, A.J., Lexon visualization: visualizing binary fact types in
ontology bases. In proceedings of the 8th International Conference on
Information Visualizations. Vol. 14-16 July
Washington,DC,USA.(2004)58-63
[21] Rajput, Q., Haider, S., Tauheed N.: Information Extraction from
Unstructured and Ungrammatical Data Sources for Semantic Annotation.
Submitted to International Conference on Ontology and Semantic
Engineering, Rome, Italy, April (2009)
[22] Ramoni, M., and Sebastani, P. (1998), Parameter Estimation in
Bayesian Networks from Incomplete Databases, Intelligent Data
Analysis, 2.
[23] Ramoni, M., and Sebastiani, P. (1995), "Learning Bayesian Networks
from Incomplete Databases," Proceedings of the 11th Conference on
Uncertainty in Artificial Intelligence.
[24] Tang, J., Li, J., Lu, H., Liang, B., Huang, X., Wang, K.: IASA: Learning
to Annotate the Semantic Web. Journal on Data Semantics. Vol. 4.
(2005) 110-145
[25] Tjoa, A., Wagner, R., Andjomshoa, A., Shayeganfar, F.: Semantic Web:
Challenges and New Requirements. In Proceedings. Sixteenth
International Workshop on Database and Expert Systems Application
(DEXA). Copenhagen Denmark (2005) 1160 - 1163
[26] Vargas-Vera, M., Motta, E., Domingue, J., Lanzoni, M., Stutt, A.,
Ciravegna, F: MnM: Ontology Driven Semi-Automatic and Automatic
Support for Semantic Markup. In Proceedings of The 13th International
Conference on Knowledge Engineering and Management. Seguenza
Spain (2002)
[27] Wilson, M., Matthews, B.: The Semantic Web: Prospects And
Challenges. In Proceeding of 7th International Baltic Conference on
Databases and Information Systems. Vilnius Lithuania (2006)
[28] Yildiz, B., Miksch, S.: Motivating ontology-driven information
extraction. In Prasad, A., Madalli, D., eds.: International Conference on
Semantic Web and Digital Libraries. Indian Statistical Institute Platinum
Jubilee Conference Series (2007) 45-53
[29] Yildiz Burcu, Miksch Silvia. ontoX - A Method for Ontology-Driven
Information Extraction. In: Computational Science and Its Applications
(ICCSA 2007), LNCS 4707, Springer-Verlag, 2007, S. 660 - 673.
[30] Mittal, A. and Kassim, A., Bayesian Network Technologies:
Applications and Graphical Models, IGI Publishing (2007)
[31] Pourret, O., Naim, Patrick, and Marcot, B., Bayesian Networks: A
Practical Guide to Applications, Wiley (2008)
[32] Heckerman, D. A Tutorial on Learning in Bayesian Networks, Learning
in Graphical Models, MIT Press (1999).