DWM-CDD: Dynamic Weighted Majority Concept Drift Detection for Spam Mail Filtering

Although e-mail is the most efficient and popular communication method, unwanted and mass unsolicited e-mails, also called spam mail, endanger the existence of the mail system. This paper proposes a new algorithm called Dynamic Weighted Majority Concept Drift Detection (DWM-CDD) for content-based filtering. The design purposes of DWM-CDD are first to accurate the performance of the previously proposed algorithms, and second to speed up the time to construct the model. The results show that DWM-CDD can detect both sudden and gradual changes quickly and accurately. Moreover, the time needed for model construction is less than previously proposed algorithms.





References:
[1] An Osterman Research White Paper "The Advantages of Using
Traffic-Shaping Techniques to Control Spam," Osterman Research, Inc.,
pp. 1-6, Jan. 2007.
[2] T. S. Guzella, and W. M. Caminhas, "A Review of Machine Learning
Approaches to Spam Filtering," Elsevier, Expert Systems with
Applications, vol. 36, no. 7, pp. 10206-10222, 2009.
[3] A. Ciltik, and T. Gungor, "Time-Efficient Spam E-mail Filtering using
n-Gram Models," Pattern Recognition Letters, vol. 29, no. 1, pp. 19-33,
Jan. 2008.
[4] E. Blanzieri, and A. Bryl, "A Survey of Learning-based Techniques of
Email Spam Filtering," Artificial Intelligence Review, vol. 29, no.1, pp.
63-922008
[5] I. Zliobate, "Learning under Concept Drift: an Overview," Technical
Report on Artificial Intelligence, Vilinios University, pp. 371-391, 2010.
[6] Q. Zhu, X. Hu, Y. Zhang, and P. Li, "A Double-Window-based
Classification Algorithm for Concept Drifting Data Streams,"
proceedings of IEEE International Conference on Granular Computing
(GrC), CA, USA, 2010, pp. 639-644.
[7] Z. Ouyang, and M. Zou, "Mining Concept-Drifting and Noisy Data
Streams using Ensemble Classifiers," proceedings of IEEE International
Conference on Artificial Intelligence and Computational Intelligence
(AICI 2009), Shanghai, China, 2009, pp. 360-364.
[8] A. Tsymbal, "The Problem of Concept Drift: Definitions and Related
Work," Technical report TCD-CS-2004-15, Trinity College Dublin,
Ireland, pp.123-. 130, 2004.
[9] J.Z. Kolter, and M.A. Maloof, "Dynamic Weighted Majority: A New
Ensemble Method for Tracking Concept Drift," Proceedings of IEEE
Third International Conference on Data Mining, Washington DC, USA,
2003, pp. 123-130.
[10] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, "Learning with Drift
Detection," Lecture Notes in Computer Science, vol. 3171/2204, pp.
66-112, 2004.
[11] M.B. Jose, J.D.C. Avila, R. Fidalgo, A. Bifet, R. Gavalda, and R.M.
Bueno, "Early Drift Detection Method," Fourth International Workshop
on Knowledge Discovery from Data Streams, Berlin, Germany, 2006, pp.
77-86.