An Empirical Analysis of Arabic WebPages Classification using Fuzzy Operators

In this study, a fuzzy similarity approach for Arabic web pages classification is presented. The approach uses a fuzzy term-category relation by manipulating membership degree for the training data and the degree value for a test web page. Six measures are used and compared in this study. These measures include: Einstein, Algebraic, Hamacher, MinMax, Special case fuzzy and Bounded Difference approaches. These measures are applied and compared using 50 different Arabic web pages. Einstein measure was gave best performance among the other measures. An analysis of these measures and concluding remarks are drawn in this study.




References:
[1] Dou Shen, Zheng Chen, Qiang Yang, Hua-Jun Zeng, Benyu Zhang,
Yuchang Lu, Wei-Ying Ma, "Web-page Classification through
Summarization", Proceedings of the ACM SIGIR 04, July 25-29, 2004,
Sheffield, South York Shire, UK.
[2] H. Chen, S. T. Dumais, "Bringing order to the Web: Automatically
categorizing search results", Proceedings of the ACM SIGCHI
Conference on Human Factors in Computing Systems (CHI-00), ACM
pp. 145-152, 2000.
[3] Michie, D., Spiegelhalter, D.J., Taylor, C.C., Machine Learning, Neural
and Statistical Classification, Ellis Horwood, London, 1994.
[4] D. H. Widyantoro, J. Yen, "A Fuzzy Similarity Approach in Text
Classification Task", Proceedings of Ninth IEEE Int. Conf. on Fuzzy
Systems (FUZZ-IEEE 2000), pp. 653-658, San Antonio, Texas, May
2000.
[5] Ahmad T. A-Taani, Noor Aldeen K Al-Awad, "A Comparative Study
of Web-pages Classification Methods Using Fuzzy Operators Applied
to Arabic Web-pages",PWASET, vol. 7, pp. 33-35, 2005.
[6] Hui Yang, Tat-Seng Chua, "Effectiveness of Web Page Classification
on Finding List Answers", Proceedings of the ACM SIGIR 04, July 25-
29, 2004, Sheffield, South York Shire, UK.
[7] Stephanie W. Haas, Erika S. Grams, "Page and Link Classifications:
Connecting Diverse Resources", Proceedings of the ACM, pp. 99-107,
Digital Libraries 1998, Pittsburgh PA USA.
[8] Michelangelo Ceci, Donato Malerba, "Hierarchical Classification of
HTML Documents with WebClassII", In: F. Sebastiani (Ed.): ECIR
2003, LNCS 2633, pp. 57-72, 2003.
[9] Rongbo Du, Rei Safavi-Naini and Willy Susilo, "Web Filtering Using
Text Classification", Proceedings of the 11th IEEE International
Conference on Network (ICON 2003), pp. 325-330, 2003.
[10] Lawrence Kai Shih, David R. Karger, "Using URLs and Table Layout
for Web Classification Tasks", Proceedings of the WWW2004, May 17-
22, 2004, pp. 193-202, New York, USA.
[11] Eric J. Glover1, Kostas Tsioutsiouliklis, Steve Lawrence, David M.
Pennock, Gary W. Flake, "Using Web Structure for Classifying and
Describing Web Pages", Proceedings of the WWW2002, May 7-11,
2002, pp. 562-569, Honolulu, Hawaii, USA.
[12] Gongde Guo, Hui Wang, David A. Bell, Yaxin Bi, Kieran Greer, "An
kNN Model-Based Approach and Its Application in Text
Categorization, Proceedings of the 5th International Conference
(CICLing 2004), Seoul, Korea, February 15-21, 2004, pp. 559-570.
[13] Anders Ardö, DTV, Lyngby, Denmark Traugott Koch, NetLab, Lund,
Sweden, "Automatic classification applied to the full-text Internet
documents in a robot-generated subject index", Proceedings of the
23rd International Online Information Meeting, London, 7-9 Dec 1999,
pp. 239-246.
[14] Aijun An, Yanhui Huang, Xiangji Huang, Nick Cercone, "Feature
Selection with Rough Sets for Web Page Classification", In:
Transactions on Rough Sets II: Rough Sets and Fuzzy Sets, James F.
Peters, Andrzej Skowron, Didier Dubois, Jerzy W. Grzymala-Busse,
Masahiro Inuiguchi, Lech Polkowski (Editors), 2004.
[15] J. A. Roubos, M. Setnes, J. Abonyi, "Learning fuzzy classification rules
from data", In: Developments in Soft Computing, John R, Birkenhead
R., (Editors), Springer-Verlag, Berlin/Heidelberg, pp.108-115, 2001.
[16] Heiner Stuckenschmidt, Jens Hartmann,Frank van Harmelen, "Learning
Structural Classification Rules for Web page Categorization",
Proceedings of FLAIRS 2002, special track on Semantic Web, S.
Haller , G. Simmons (Editors)..
[17] Sarah Zelikovitz, Haym Hirsh, "Improving Short-Text Classification
using Unlabeled Background Knowledge to Assess Document
Similarity", Proceedings of the Seventeenth International Conference
on Machine Learning (ICML-2000), Morgan Kaufmann Publishers.
[18] Wlodzislaw Duch, "Similarity-based methods: a general framework for
classification, approximation and association", Control and
Cybernetics, vol. 29 (2000), Grudzia┬©dzka, Toru'n, Poland.