Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Classification is an important data mining technique
and could be used as data filtering in artificial intelligence. The
broad application of classification for all kind of data leads to be
used in nearly every field of our modern life. Classification helps us
to put together different items according to the feature items decided
as interesting and useful. In this paper, we compare two
classification methods Naïve Bayes and ADTree use to detect spam
e-mail. This choice is motivated by the fact that Naive Bayes
algorithm is based on probability calculus while ADTree algorithm is
based on decision tree. The parameter settings of the above
classifiers use the maximization of true positive rate and
minimization of false positive rate. The experiment results present
classification accuracy and cost analysis in view of optimal classifier
choice for Spam Detection. It is point out the number of attributes to
obtain a tradeoff between number of them and the classification
accuracy.




References:
[1] S. M. Weiss and N. Indurkhya. Predictive data mining practical guide.
Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998.
[2] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard
Dobbs, Charles Roxburgh, and Angela Hung Byers. Big data: The next
frontier for innovation, competition and productivity. Technical report,
McKinsey Global Institute, May 2011.
[3] Jiban K Pal , Usefulness and applications of data mining in extra cting
information from different perspectives, Annals of Library and
Information Studies, Vol. 58, March 2011, pp. 7-16
[4] http://www.radicati.com/wp/wp-content/uploads/2012/10/Email Market
-2012-2016-Executive-Summary.pdf.
[5] Data Mining/ Data Warehousing Mosud Y. Olumoye Lagos State
Polytechnic, S.P.T.S.A. & Director of Operations, Fiatcom Nig. Ltd.
Nigeria.
[6] G. Piatetsky-Shapiro and W. J. Frawley. Knowledge Discovery in
Databases. AAAI/MIT Press, 1991.
[7] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan
Kaufmann, San Francisco, 2001.
[8] Sita Gupta, Vinod Todwal, Web Data Mining & Applications,
nternational Journal of Engineering and Advanced Technology (IJEAT)
ISSN: 2249 –8958, Volume-1, Issue-3, February 2012.
[9] Data mining classification Fabriciovoznika Leonardoviana
[10] George Dimitoglou, James A. Adams, and Carol M. Jim, Comparison of
the C4.5 and a Naive Bayes Classifier for the Prediction of Lung Cancer
Survivability.
[11] Seongwook Youn, Dennis McLeod, A Comparative Study for Email
Classification.
[12] Yoav Freund and Llew Mason. The Alternating Decision Tree
Algorithm. Proceedings of the 16th International Conference on
Machine Learning, pages 124-133 (1999).
[13] Bernhard Pfahringer, Geoffrey Holmes and Richard Kirkby, Optimizing
the Induction of Alternating Decision Trees, Proceedings of the Fifth
Pacific-Asia Conference on Advances in Knowledge Discovery and
Data Mining. 2001, pp. 477-487.
[14] Anshul Goyal and Rajni Mehta, Performance Comparison of Naïve
Bayes and J48 Classification Algorithms, International Journal of
Applied Engineering Research, ISSN 0973-4562 Vol.7 No.11 (2012).
[15] Tina R. Patil, Mrs. S.S. Sherekar, Performance Analysis of Naïve Bayes
and J48 Classification Algorithm for Data Classification, Internationl
Jpournal of Computer Science And Applications, Vol. 6, No.2, Apr
2013.
[16] Xiang yang Li, Nong Ye, A Supervised Clustering and Classification
Algorithm for Mining Data With Mixed Variables, IEEE Transactions
on Systems, man, and Cybernetics, Vol. 36, No. 2, 2006, pp. 396-406.
[17] https://archive.ics.uci.edu/ml/datasets/Spambase (Accessed online on
January 2016).
[18] http://archive.ics.uci.edu/ml/. (Accessed online on January 2016).