A Comparative Study of Web-pages Classification Methods using Fuzzy Operators Applied to Arabic Web-pages

In this study, a fuzzy similarity approach for Arabic web pages classification is presented. The approach uses a fuzzy term-category relation by manipulating membership degree for the training data and the degree value for a test web page. Six measures are used and compared in this study. These measures include: Einstein, Algebraic, Hamacher, MinMax, Special case fuzzy and Bounded Difference approaches. These measures are applied and compared using 50 different Arabic web-pages. Einstein measure was gave best performance among the other measures. An analysis of these measures and concluding remarks are drawn in this study.





References:
[1] Dou Shen, Zheng Chen, Qiang Yang, Hua-Jun Zeng, Benyu Zhang,
Yuchang Lu, Wei-Ying Ma. Web-page Classification through
Summarization. Proc. of the ACM SIGIR 04, , July 25-29, 2004.
Sheffield, South York Shire, UK.
[2] H. Chen and S. T. Dumais. Bringing order to the Web: Automatically
categorizing search results. Proc. of CHI2000, 2000, 145-152.
[3] D. Michie, D.J. Spiegelhalter, C.C. Taylor. February 17, 1994.
Machine Learning, Neural and Statistical Classification, Institute of
Public Health, University Forvie Site, Robinson Way, Cambridge, U.K.
[4] Dwi H. Widyantoro and John Yen, Department of Computer Science
Texas A&M University, 1999. A Fuzzy Similarity Approach in Text
Classification Task, Texas, USA.
[5] Hui Yang, Tat-Seng Chua. Effectiveness of Web Page Classification on
Finding List Answers. Proc. of the ACM SIGIR 04, July 25-29, 2004,
Sheffield, South York Shire, UK.
[6] Stephanie W. Haas, and Erika S. Grams. Page and Link Classifications:
Connecting Diverse Resources. Proc. of the ACM, 1998. 99-107. Digital
Libraries 98 Pittsburgh PA USA.
[7] Michelangelo Ceci and Donato Malerba. Hierarchical Classification of
HTML Documents with WebClassII. F. Sebastiani (Ed.): ECIR 2003,
LNCS 2633, pp. 57-72, 2003.
[8] Rongbo Du, Reihaneh Safavi-Naini and Willy. Web Filtering Using
Text Classification, 2002, supported by Smart Internet Technology
Cooperative Research Centre, Australia.
[9] Lawrence Kai Shih and David R. Karger. Using URLs and Table
Layout for Web Classification Tasks. WWW2004, May 17-22, 2004,
pages 193-202, supported by ACM, New York, USA.
[10] Eric J. Glover1, Kostas Tsioutsiouliklis, Steve Lawrence, David M.
Pennock , Gary W. Flake. Using Web Structure for Classifying and
Describing Web Pages. WWW2002, May 7-11, 2002, , pages 562-569,
supported by ACM, Honolulu, Hawaii, USA.
[11] Gongde gue, Hue Wang, David bell, Yaxin bi, and Kairan Greer. A
KNN model-based approach and its application in text categorization,
2002, supported by European Commission project ICONS, project no.
IST-2001-32429.
[12] Anders Ardö, DTV, Lyngby, Denmark Traugott Koch, NetLab, Lund,
Sweden. Automatic classification applied to the full-text Internet
documents in a robot-generated subject index, 1999. Manuscript of a
forthcoming publication in proceedings of the Online Information 99
Conference, London.
[13] Aijun An, Yanhui Huang, Xiangji Huang, and Nick Cercone. Feature
Selection with Rough Sets for Web Page Classification, 2002. Supported
by natural Sciences and Engineering Research Council (NSERC) of
Ontario, Canada and the Institute for Robotics and Intelligent Systems
(IRIS).
[14] Hans Roubos, Magne Setnes, and Janos Abonyi, 2000. Learning Fuzzy
Classification Rules from Data.
[15] Heiner Stuckenschmidt, Jens Hartmann and Frank van Harmelen, 2002,
American Association for Artificial Intelligence (www.aaai.org).
Learning Structural Classification Rules for Web page Categorization.
Bremen, Germany.
[16] Sarah Zelikovitz, Haym Hirsh, 1999. Improving Short-Text
Classification Using Unlabeled Background Knowledge to Assess
Document Similarity. Computer Science Department, Rutgers
University, USA.
[17] Włodzisław Duch. Similarity-based methods: a general framework for
classification, approximation and association, Control and Cybernetics
vol.29 (2000), Grudzia┬©dzka, Toru'n, Poland.