A Web Text Mining Flexible Architecture

Text Mining is an important step of Knowledge Discovery process. It is used to extract hidden information from notstructured o semi-structured data. This aspect is fundamental because much of the Web information is semi-structured due to the nested structure of HTML code, much of the Web information is linked, much of the Web information is redundant. Web Text Mining helps whole knowledge mining process to mining, extraction and integration of useful data, information and knowledge from Web page contents. In this paper, we present a Web Text Mining process able to discover knowledge in a distributed and heterogeneous multiorganization environment. The Web Text Mining process is based on flexible architecture and is implemented by four steps able to examine web content and to extract useful hidden information through mining techniques. Our Web Text Mining prototype starts from the recovery of Web job offers in which, through a Text Mining process, useful information for fast classification of the same are drawn out, these information are, essentially, job offer place and skills.




References:
[1] M. Castellano, G. Mastronardi, A. Aprile, G. Bellone de Grecis, F.
Fiorino, "Applying a Flexible Mining Architecture to Intrusion
detection", ARES 2007, Second International workshop Data
Warehousing and Data Mining, DAWAM 2007, Vienna, April, 2007.
[2] M. Castellano, N. Pastore, F. Arcieri, V. Summo, and G. Bellone de
Grecis, "A Knowledge Center for a Social and Economic Growth of the
Territory", IEEE Computer Society Press, International Conference On
System Sciences, Big Island Hawaii, 3-6 January 2005.
[3] M.Castellano, N. Pastore, F. Arcieri, V. Summo, and G. Bellone de
Grecis, "An e-Government Cooperative Framework for Government
Agencies", IEEE Computer Society Press, International Conference On
System Sciences, Big Island Hawaii, 3-6 January 2005.
[4] M.Castellano, N.Pastore, F.Arcieri, V. Summo, and G. Bellone de
Grecis, "A Flexible Mining Architecture for Providing New EKnowledge
Services", IEEE Computer Society Press, International
Conference On System Sciences, Big Island Hawaii, 3-6 January 2005.
[5] M. Castellano, N. Pastore, F. Arcieri, V. Summo, and G. Bellone de
Grecis, "Orchestrating Knowledge Discovery Process", E-Service
Intelligence: Methodologies, Technologies and Application, Springer, pp
447-496.
[6] M. Castellano, F. Fiorino, F. Arcieri, V. Summo, and G. Bellone de
Grecis, "A Web Mining Process for e-Knowledge Service", E-Service
Intelligence: Methodologies, Technologies and Application, Springer, pp
447-496. A Web Mining.
[7] W. Lee, SJ. Stolfo, KW. Mok, "Data Mining Approaches for Intrusion
Detection", Proceeding of the 7th USENIX Security Symposium, 1998.
[8] W. Zhong, X. Tang, "Web Text Mining on XSSC" Institute of System
Science, Academy of Mathematics and System Science.
[9] Knowledge Discovery for Text, RGU: school of Computing, California.
[10] A.H. Tan, Text Mining: The State of the Art and the Challenges, in
PAKDD99 Whorkshop on Knowledge Discovery from advanced
Databases, Beijing, China, April 1999.
[11] Nahm U.Y. e Mooney R.J., Using Information Extraction to Aid the
Discovery of Prediction Rules from Text, in KDD2000 Workshop on
Text Mining, Boston, Massachusetts, USA, August 2000.
[12] B. Mobasher, R. Cooley, and J. Srivastava: Creating Adaptive Web Sites
Through Usage-Based Clustering of URLs(1999), In Proc. of the 1999
IEEE Knowledge and Data Engineering Exchange Workshop
(KDEX'99), November 1999.
[13] R. Kimball and R. Merz: "The Data Webhouse Toolkit, Building the
Web-Enabled Data Warehouse", John Wiley & Sons, January 2000.
[14] Cooley, R. et al, "Web Mining: Information and Pattern Discovery on
the World Wide Web", In Proceeding of IEEE International Conference
Tools with AI. Newport Beach, California, USA, pp. 558-567, (1997).
[15] Etzioni, O., "The World Wide Web: Quagmire or GoldMine",
Communication of the ACM, Vol. 39, No. 11, pp. 65-68, (1996).
[16] Chakrabarti, S. et al, Focused Crawling, "A New Approach to Topic-
Specific Web Resource Discovery", In Proceeding on the 8th
International Word Wide Web Conference,. Toronto, Canada, pp. 1623-
1640, (1999).
[17] A. Hotho, A. Numberger, G. Paab, A brief Survey of Text Mining,
University of Kassel, School of Computer Science, Knowledge
Discovery Group, 13 May, 2005.
[18] GATE - General Architetcture for Text Engineering, http://gate.ac.uk/
[19] Saurav S. Bhowmick, Wee Keong Mg, Sanjay Madria, "Web Schemas
in WHOWEDA", Data Warehousing and OLAP. McLean, Virginia,
United States. Year 2000, pp. 17 - 24, ISBN:1-58113-323-5.
[20] Service Oriented Architecture, SOA, White Paper.
[21] Brin, S. and Page, L., "The Anatomy of a Large Scale Hypertextual Web
Search Engine", In Proceeding of the 7th International World Wide Web
Conference, Brisbane, Australia, pp. 107-117, (1998).
[22] MARINHO, Leandro Balby Marinho, Girardi Rosario: "Mineração da
Web", Revista Eletrônica de Iniciação Cientfica, São Luiz, Jun. 2003.
[23] Adriana Marotta, Regina Motz, Raul Ruggia, "Managing Source Schema
Evolution in Web Warehouses", Instituto de Computaci├│n, Facultad de
Ingeniería Universidad de la República. Montevideo, Uruguay, 2001.
[24] O.Etzioni, "The world wide web: Quagmire or gold mine", Comm.of
the ACM,39(11):6568,1996.