Using Textual Pre-Processing and Text Mining to Create Semantic Links

This article offers a approach to the automatic discovery
of semantic concepts and links in the domain of Oil Exploration
and Production (E&P). Machine learning methods combined with
textual pre-processing techniques were used to detect local patterns in
texts and, thus, generate new concepts and new semantic links. Even
using more specific vocabularies within the oil domain, our approach
has achieved satisfactory results, suggesting that the proposal can
be applied in other domains and languages, requiring only minor
adjustments.




References:
[1] Miles, A. & Brickley, D. (2009, August 18). SKOS Simple
Knowledge Organization System Primer. Retrieved from
https://www.w3.org/TR/skos-primer/.
[2] Agˆencia Nacional do Petr´oleo, G´as Natural e Biocombust´ıveis (2016,
August 19). Gloss´ario Retrieved from http://www.anp.gov.br/glossario.
[3] Fern´andez, E. F., Pedrosa Junior, O., Pinho , A. C. (2015, January 7).
Dicion´ario do Petr´oleo Retrieved from http://dicionariodopetroleo.com.br.
[4] Anthonysamy, P., Edwards, M. J., Weichel, C. & Rashid, A. (2016).
Inferring Semantic Mapping Between Policies and Code: The Clue is
in the Language. In: ESSoS (p./pp. 233-250): Springer.
[5] Avila, Ricardo, Santos, Salomao, Araujo, David, Vidal, Vania Maria Ponte
and de Macedo, Jose Antonio Fernandes. Semantic Links Using SKOS
Predicates. Paper presented at the meeting of the KES, 2017.
[6] Bland, J. M. and D. G. Altman (1996). Transformations, means, and
confidence intervals. 312(7038), 1079.
[7] Bot, M. C. J. (2000). Improving Induction of Linear Classification Trees
with Genetic Programming. In: Proc. of the Genetic and Evolutionary
Computation Conference (GECCO-2000). Las Vegas,Nevada,USA, pp.
403–410.
[8] Brown, M. L. and J. F. Kros (2009). Imprecise Data and the Data Mining
Process. In: Encyclopedia of Data Warehousing and Mining. IGI Global,
pp. 999–1005.
[9] Chakrabarti, S. (2002). Mining the Web: Discovering Knowledge from
Hypertext Data. Morgan-Kauffman.
[10] Engels, R., G. Lindner, and R. Studer (1997). A Guided Tour through
the Data Mining Jungle. In: KDD. AAAI Press, pp. 163–166.
[11] Engels, R. and C. Theusinger (1998). Using a Data Metric for
Preprocessing Advice for Data Mining Applications. In: ECAI, pp.
430–434.
[12] Hasan, M. A., V. Chaoji, S. Salem, and M. Zaki (2006). Link Prediction
Using Supervised Learning. In: Proc. of SDM 06 workshop on Link
Analysis, Counterterrorism and Security.
[13] Joachims, T. (2002). Learning to Classify Text Using Support Vector
Machines. Kluwer Academic Publishers.
[14] Kuhn, M. and K. Johnson (2013). Applied predictive modeling. Vol. 26.
Springer.
[15] Lampos, V., B. Zou, and I. J. Cox (2017). Enhancing Feature Selection
Using Word Embeddings: The Case of Flu Surveillance. In: WWW. ACM,
pp. 695–704.
[16] Lesk, M. (1986). Automatic sense disambiguation using machine
readable dictionaries: how to tell apine cone from an ice cream cone. In:
Proceedings of ACM SIGDOC Conference, pp. 24–26.
[17] Lichtenwalter, R., J. T. Lussier, and N. V. Chawla (2010). New
perspectives and methods in link prediction. In: KDD, pp. 243–252.
[18] Miles, A. and S. Bechhofer (2008). SKOS Simple
Knowledge Organization System Reference. W3C. URL:
http://www.w3.org/TR/skos-reference.
[19] Miles, A., B. Matthews, M. Wilson, and D. Brickley (2005). SKOS core:
Simple Knowledge Organisation for the Web. In: Proc. of international
conference on DC and metadata applications. DC Metadata Initiative, pp.
1–9.
[20] Morik, K. (2000). The Representation Race - Preprocessing for Handling
Time Phenomena. In: ECML. Vol. 1810. Lecture Notes in Computer
Science. Springer, pp. 4–19.
[21] Muller, P., C. Fabre, and C. Adam (2014). Predicting the relevance
of distributional semantic similarity with contextual information. In:
Proc. of the 52nd Annual Meeting of the Association for Computational
Linguistics. Volume 1, pp. 479–488.
[22] Mustapha, S. M. F. D. S. (2018). Case-based reasoning for identifying
knowledge leader within online community. Expert Syst. Appl. 97,
244–252.
[23] Su, Y. and S.-U. Guan (2016). Density and Distance Based KNN
Approach to Classification. IJAEC7(2), 45–60.
[24] Sun, S., D. Liu, G. Li, W. Yu, and L. Pang (2010). Combination of
Ontology Model and Semantic Link Network in Web Resource Retrieval.
In: SKG. IEEE Computer Society, pp. 285–288.
[25] Ukey, K. and A. Alvi (2012). Text Classification using Support
Vector Machine. In: International Journal of Engineering and Technology
(IJERT).
[26] Volker, J., P. Haase, and P. Hitzler (2009). Learning expressive
ontologies. IOS Press.
[27] Volz, J., C. Bizer, M. Gaedke, and G. Kobilarov (2009). Discovering
and Maintaining Links on the Web of Data. In: International Semantic
Web Conference. Vol. 5823. Springer, pp. 650–665. [28] Wang, Z., J. Li, Y. Zhao, R. Setchi, and J. Tang (2013). A unified
approach to matching semantic data onthe Web. Knowl.-Based Syst.39,
173–184.
[29] Weiss, S. M., N. Indurkhya, T. Zhang, and F. Damerau (2005). Text
Mining: Predictive Methods for Analyzing Unstructured Information.
Springer.
[30] Zhang, C., G.-R. Xue, Y. Yu, and H. Zha (2009). Web-scale classification
with naive bayes. In: WWW.ACM, pp. 1083–1084.
[31] Zhang, J. and Y. Yang (2003). Robustness of regularized linear
classification methods in text categorization. In: SIGIR. ACM, pp.
190–197.
[32] Zhuge, H. (2009). Communities and Emerging Semantics in Semantic
Link Network: Discovery and Learning. IEEE Trans. Knowl. Data Eng.21
(6), 785–799.