Anomaly Detection and Characterization to Classify Traffic Anomalies Case Study: TOT Public Company Limited Network

This paper represents four unsupervised clustering algorithms namely sIB, RandomFlatClustering, FarthestFirst, and FilteredClusterer that previously works have not been used for network traffic classification. The methodology, the result, the products of the cluster and evaluation of these algorithms with efficiency of each algorithm from accuracy are shown. Otherwise, the efficiency of these algorithms considering form the time that it use to generate the cluster quickly and correctly. Our work study and test the best algorithm by using classify traffic anomaly in network traffic with different attribute that have not been used before. We analyses the algorithm that have the best efficiency or the best learning and compare it to the previously used (K-Means). Our research will be use to develop anomaly detection system to more efficiency and more require in the future.





References:
[1] K. Ramah, H. Ayari, and F. Kamoun, "Traffic Anomaly Detection and
Characterization in the Tunisian National University Network",
Networking, 2006, pp. 136-147.
[2] A. Lakhina, M. Crovella, and C. Diot, "Mining Anomalies Using Traffic
Feature Distributions", Technical Report BUCS-TR-2005-002, Boston
University, 2005.
[3] M. Shyu, S. Chen, K. Sarinnapakorn, and L. Chang, "A Novel Anomaly
Detection Scheme Based on Principal Component Classifier-, In
Proceedings of the IEEE Foundations and New Directions of Data
Mining Workshop, in conjunction with the Third IEEE International
Conference on Data Mining (ICDM-03), pp.172-179, Melbourne,
Florida, USA, 2003.
[4] G. M├╝nz, S. Li, and G. Carle, "Traffic Anomaly Detection Using KMeans
Clustering", In GI/ITG Workshop MMBnet, 2007.
[5] L.S. Silva, T.D. Mancilha, J.D.S. Silva, A.C.F. Santos, e A. Montes, "A
Framework for Analysis of Anomalies in the Network Traffic", In
INPE-06, S├úo José dos Campos, December 2006.
[6] P. Tan, M. Steinbach, V. Kuman, "Introduction to Data Mining",
Addison Wesley, 2006.
[7] J. Erman, M. Arlitt, A. Mahanti, "Traffic Classification Using Clustering
Algorithms", In SIGCOMM-06 MineNet Workshop, Pisa, Italy,
September 2006.
[8] A. McGregor, M. Hall, P. Lorier, and J. Brunskill, "Flow Clustering
Using Machine Learning Techniques", In PAM 2004, Antibes Juan-les-
Pins, France, April 19-20, 2004.
[9] S. Zander, T. Nguyen, and G. Armitage, "Automatic Traffic
Classification and Application Identification using Machine Learning",
In LCN-05, Sydney, Australia, Nov 15-17, 2005.
[10] RapidMiner Homepage, http://rapid-i.com/content/ blogcategory/38/69/
[11] Ethereal Homepage, http://www.rootsecure.net/content/
downloads/pdf/ethereal_guide.pdf
[12] M. Ester, H. Kriegel, J. Sander, and X. Xu, "A density-based Algorithm
for discovering Clusters in Large Spatial Databases with Noise", In 2nd
Int. Conf. on Knowledge Discovery and Data Mining (KDD 96),
Portland, USA, 1996.
[13] J. MacQueen, "Some methods for classification and analysis of
multivariate observations", In Proceedings of 5-th Berkeley Symposium
on Mathematical Statistics and Probability, University of California
Press, 1976, pp. 281-297.
[14] Winston H. Hsu, and Shih-Fu Chang, "Visual Cue Cluster Construction
via Information Bottleneck Principle and Kernel Density Estimation", In
CIVR 2005, pp. 82-91.
[15] Zheng-Yu Niu, Dong-Hong Ji, Chew Lim Tan, "Using cluster validation
criterion to identify optimal feature subset and cluster number for
document clustering", In Information Processing and Management- 06,
2006.
[16] icml2006 ...Precision , Recall
[17] http://en.wikipedia.org/wiki/F-score
[18] http://en.wikipedia.org/wiki/Precision_and_recall
[19] J. Davis and M. Goadrich, "The Relationship Between Precision-Recall
and ROC Curves", In ICML-06, 2006
[20] M. Pirooznia, J. Y Yang, M. Qu Yang and Y. Deng, "A comparative
study of different machine learning methods on microarray gene
expression data", In BIOCOMP-07, June 2007.
[21] Michael W. Berry, Umeshwar Dayal, Chandrika Kamath and David
Skillicorn, "Proceedings of the Fourth SIAM International Conference
on Data Mining", p 338, 2004
[22] B. Sugato ,"Semi-supervised Clustering: Learning with Limited User
Feedback", November 2003
[23] A. William, "Clustering Algorithms for Categorical Data", September
2006.
[24] RapidMiner Homepage, http://downloads.sourceforge.net/yale/
rapidminer -4.2 -guimanual.pdf