Mining Association Rules from Unstructured Documents
This paper presents a system for discovering
association rules from collections of unstructured documents called
EART (Extract Association Rules from Text). The EART system
treats texts only not images or figures. EART discovers association
rules amongst keywords labeling the collection of textual documents.
The main characteristic of EART is that the system integrates XML
technology (to transform unstructured documents into structured
documents) with Information Retrieval scheme (TF-IDF) and Data
Mining technique for association rules extraction. EART depends on
word feature to extract association rules. It consists of four phases:
structure phase, index phase, text mining phase and visualization
phase. Our work depends on the analysis of the keywords in the
extracted association rules through the co-occurrence of the keywords
in one sentence in the original text and the existing of the keywords
in one sentence without co-occurrence. Experiments applied on a
collection of scientific documents selected from MEDLINE that are
related to the outbreak of H5N1 avian influenza virus.
[1] B. Lent, R. Agrawal, and R. Srikant, "Discovering trends in text
Databases," KDD-97, 1997, pp.227-230.
[2] C. Manning and H Sch├╝tze, Foundations of statistical natural language
processing (MIT Press, Cambridge, MA, 1999).
[3] D. Rösner and M. Kunze, "The XDOC Document Suite -- A Workbench
for Document Mining," In Text Mining - Theoretical Aspects and
Applications, Advances in Soft Computing, Physica - Verlag, 2003, 113-
130.
[4] G. W. Paynter, I. H. Witten, S. J. Cunningham, and G. Buchanan,
"Scalable browsing for large collections: a case study," 5th Conf. digital
Libraries, Texas, 2000, 215-218.
[5] H. Ahonen, O. Heinonen, M. klemettinen, and A. Inkeri Verkamo,
"Mining in the phrasal frontier," Proc. PKDD-97.1st European
Symposium on Principle of data Mining and Knowledge Discovery,
Norway, June, Trondheim, 1997.
[6] H. Ahonen, O. Heinonen, M. Klemettinen, and A. Inkeri Verkamo,
"Applying data mining technique for descriptive phrase extraction in
digital document collections,"Proc. of IEEE Forum on Research and
technology Advances in Digital Libraries, Santa Barbra CA, 1998, 2-11.
[7] H. Karanikas and B. Theodoulidis, "Knowledge discovery in text and
text mining software," Technical Report, UMIST Departement of
Computation, January 2002.
[8] H. Mannila, H. Toivonen and A. I. Verkamo, "Discovery of frequent
episodes in event sequences," Data Mining and Knowledge Discovery,
1(3), 1997b, pp. 259-289.
[9] J. Paralic and P. Bednar, "Text mining for documents annotation and
ontology support (A book chapter in: "intelligent systems at service of
Mankind," ISBN 3-935798-25-3, Ubooks, Germany, 2003).
[10] M. Rajman and R. Besancon, Text mining: natural language techniques
and text mining applications. Proc. 7th working conf. on database
semantics (DS-7), Chapan &Hall IFIP Proc. Series. Leysin, Switzerland
Oct. 1997, 7-10.
[11] R. Agrawal and R. Srikant, "Fast algorithms for mining association
rules," In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors,
Proc. 20th Int. conf. of very Large Data Bases, VLDB, Santigo, Chile,
1994, 487-499.
[12] R. Baeza-Yates and B. Ribeiro-Neto, Modern information retrieval
(Addison-Wesley, Longman publishing company, 1999).
[13] R. Feldman and I. Dagan, Knowledge discovery in textual databases
(KDT), Proc. 1st nt. Conf. on Knowledge Discovery and Data Mining,
1995.
[14] R. Feldman and H. Hirsh, "Mining associations in text in the presence of
background knowledge," Proc. 2nd Int. Conf. on Knowledge Discovery
and Data Mining, Portland, USA, 1996.
[15] S. Brin, R. Motwani, and C. Silverstein, "Beyond market baskets:
generalizing association rules to dependence rules," KDD-98, 1998, 39-
68.
[1] B. Lent, R. Agrawal, and R. Srikant, "Discovering trends in text
Databases," KDD-97, 1997, pp.227-230.
[2] C. Manning and H Sch├╝tze, Foundations of statistical natural language
processing (MIT Press, Cambridge, MA, 1999).
[3] D. Rösner and M. Kunze, "The XDOC Document Suite -- A Workbench
for Document Mining," In Text Mining - Theoretical Aspects and
Applications, Advances in Soft Computing, Physica - Verlag, 2003, 113-
130.
[4] G. W. Paynter, I. H. Witten, S. J. Cunningham, and G. Buchanan,
"Scalable browsing for large collections: a case study," 5th Conf. digital
Libraries, Texas, 2000, 215-218.
[5] H. Ahonen, O. Heinonen, M. klemettinen, and A. Inkeri Verkamo,
"Mining in the phrasal frontier," Proc. PKDD-97.1st European
Symposium on Principle of data Mining and Knowledge Discovery,
Norway, June, Trondheim, 1997.
[6] H. Ahonen, O. Heinonen, M. Klemettinen, and A. Inkeri Verkamo,
"Applying data mining technique for descriptive phrase extraction in
digital document collections,"Proc. of IEEE Forum on Research and
technology Advances in Digital Libraries, Santa Barbra CA, 1998, 2-11.
[7] H. Karanikas and B. Theodoulidis, "Knowledge discovery in text and
text mining software," Technical Report, UMIST Departement of
Computation, January 2002.
[8] H. Mannila, H. Toivonen and A. I. Verkamo, "Discovery of frequent
episodes in event sequences," Data Mining and Knowledge Discovery,
1(3), 1997b, pp. 259-289.
[9] J. Paralic and P. Bednar, "Text mining for documents annotation and
ontology support (A book chapter in: "intelligent systems at service of
Mankind," ISBN 3-935798-25-3, Ubooks, Germany, 2003).
[10] M. Rajman and R. Besancon, Text mining: natural language techniques
and text mining applications. Proc. 7th working conf. on database
semantics (DS-7), Chapan &Hall IFIP Proc. Series. Leysin, Switzerland
Oct. 1997, 7-10.
[11] R. Agrawal and R. Srikant, "Fast algorithms for mining association
rules," In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors,
Proc. 20th Int. conf. of very Large Data Bases, VLDB, Santigo, Chile,
1994, 487-499.
[12] R. Baeza-Yates and B. Ribeiro-Neto, Modern information retrieval
(Addison-Wesley, Longman publishing company, 1999).
[13] R. Feldman and I. Dagan, Knowledge discovery in textual databases
(KDT), Proc. 1st nt. Conf. on Knowledge Discovery and Data Mining,
1995.
[14] R. Feldman and H. Hirsh, "Mining associations in text in the presence of
background knowledge," Proc. 2nd Int. Conf. on Knowledge Discovery
and Data Mining, Portland, USA, 1996.
[15] S. Brin, R. Motwani, and C. Silverstein, "Beyond market baskets:
generalizing association rules to dependence rules," KDD-98, 1998, 39-
68.
@article{"International Journal of Information, Control and Computer Sciences:52624", author = "Hany Mahgoub", title = "Mining Association Rules from Unstructured Documents", abstract = "This paper presents a system for discovering
association rules from collections of unstructured documents called
EART (Extract Association Rules from Text). The EART system
treats texts only not images or figures. EART discovers association
rules amongst keywords labeling the collection of textual documents.
The main characteristic of EART is that the system integrates XML
technology (to transform unstructured documents into structured
documents) with Information Retrieval scheme (TF-IDF) and Data
Mining technique for association rules extraction. EART depends on
word feature to extract association rules. It consists of four phases:
structure phase, index phase, text mining phase and visualization
phase. Our work depends on the analysis of the keywords in the
extracted association rules through the co-occurrence of the keywords
in one sentence in the original text and the existing of the keywords
in one sentence without co-occurrence. Experiments applied on a
collection of scientific documents selected from MEDLINE that are
related to the outbreak of H5N1 avian influenza virus.", keywords = "Association rules, information retrieval, knowledgediscovery in text, text mining.", volume = "2", number = "8", pages = "2613-6", }