A Text Mining Technique Using Association Rules Extraction

This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing on the words and their statistical distributions in documents. The main contributions of the technique are that it integrates XML technology with Information Retrieval scheme (TFIDF) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) and use Data Mining technique for association rules discovery. It consists of three phases: Text Preprocessing phase (transformation, filtration, stemming and indexing of the documents), Association Rule Mining (ARM) phase (applying our designed algorithm for Generating Association Rules based on Weighting scheme GARW) and Visualization phase (visualization of results). Experiments applied on WebPages news documents related to the outbreak of the bird flu disease. The extracted association rules contain important features and describe the informative news included in the documents collection. The performance of the EART system compared with another system that uses the Apriori algorithm throughout the execution time and evaluating extracted association rules.





References:
[1] B. Lent, R. Agrawal, and R. Srikant, "Discovering trends in text Databases,"
KDD-97, 1997, pp.227-230.
[2] C. Manning and H Sch├╝tze, Foundations of statistical natural language
processing (MIT Press, Cambridge, MA, 1999).
[3] G. W. Paynter, I. H. Witten, S. J. Cunningham, and G. Buchanan, "Scalable
browsing for large collections: a case study," 5th Conf. digital Libraries,
Texas, 2000, 215-218.
[4] H. Ahonen, O. Heinonen, M. klemettinen, and A. Inkeri Verkamo, "Mining in
the phrasal frontier," in Proc. PKDD-97.1st European Symposium on
Principle of data Mining and Knowledge Discovery, Norway, June,
Trondheim, 1997.
[5] H. Ahonen, O. Heinonen, M. Klemettinen, and A. Inkeri Verkamo, "Applying
data mining technique for descriptive phrase extraction in digital document
collections," in Proc. of IEEE Forum on Research and technology
Advances in Digital Libraries, Santa Barbra CA, 1998.
[6] H. Karanikas and B. Theodoulidis, "Knowledge discovery in text and
text mining software," Technical Report, UMIST Departement of
Computation, January 2002.
[7] H. Mahgoub,"Mining association rules from unstructured documents"
in Proc. 3rd Int. Conf. on Knowledge Mining, ICKM, Prague, Czech
Republic, Aug. 25-27, 2006, pp. 167-172.
[8] H. Mannila, H. Toivonen and A. I. Verkamo, "Discovery of frequent
episodes in event sequences," Data Mining and Knowledge
Discovery, 1(3), 1997b, pp. 259-289.
[9] J. Paralic and P. Bednar, "Text mining for documents annotation and
ontology support (A book chapter in: "intelligent systems at service
of Mankind," ISBN 3-935798-25-3, Ubooks, Germany, 2003).
[10] K. Norvag, T. Eriksen, and K. Skgstad, "Mining association rules in
temporal document collections," Available:
http://www.idi.ntnu.no/~noervaag/papers/ISMIS2006.pdf
[11] M. Rajman and R. Besancon, "Text mining: natural language
techniques and text mining applications", in Proc. 7th working conf.
on database semantics (DS-7), Chapan &Hall IFIP Proc. Series.
Leysin, Switzerland Oct. 1997, 7-10.
[12] R. Agrawal and R. Srikant, "Fast algorithms for mining association
rules," In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo,
editors, Proc. 20th Int. conf. of very Large Data Bases, VLDB,
Santigo, Chile, 1994, 487-499.
[13] R. Baeza-Yates and B. Ribeiro-Neto, Modern information retrieval
(Addison-Wesley, Longman publishing company, 1999).
[14] R. Feldman and I. Dagan, "Knowledge discovery in textual databases
(KDT)", in Proc. 1st Int. Conf. on Knowledge Discovery and Data
Mining, 1995.
[15] R. Feldman and H. Hirsh, "Mining associations in text in the presence
of background knowledge," in Proc. 2nd Int. Conf. on Knowledge
Discovery and Data Mining, Portland, USA, 1996.
[16] R. Feldman and M. Fresko, Y. Kinar, Y Lindell, O. Liphstat, M.
Rajman, Y. Schler, O. Zamir, "Text mining at the term level," in
Proc. 2nd European symposium on Principles of Data Mining and
Knowledge Discovery (PKDD-98), Vol. 1510, Nantes pp 65-73.
[17] R. Feldman and M. Fresko, Y. Kinar, Y Lindell, O. Liphstat, M.
Rajman, Y. Schler, O. Zamir, "Knowledge management: a text
mining approach," in Proc. of th 2nd Int. Conf. on Practical Aspects
of Knowledge Management (PAKM98), Basel, Switzerland, 29-30
Oct. 1998.
[18] X. Chen and Y. Wu, "Personalized knowledge discovery: mining
novel association rules from text" Available:
www.siam.org/meetings/sdm06/proceedings/067chenx.pdf
[19] Y. Kodratoff, "Knowledge discovery in texts: a definition, and
applications," in Proc. of th 2nd Int., symposium, ISMS-99, Vol. 1609
of LNAI, Warsaw, Pol. Springer, Berlin Heidelberg New York, pp
16-29.
[20] Y. Liu, S. Navathe, A. Pivoshenko, A. Dasigi, R. Dingledine, B.
Ciliax, "Text analysis of Medline for discovering functional
relationships among genes: evaluation of keyword extraction
weighting schemes," Int. J. Data Mining and Bioinformatics, Vol. 1,
No 1, 2006.