Unsupervised Outlier Detection in Streaming Data Using Weighted Clustering

Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.





References:
[1] J. Han and M. Kamber, Data Mining: Concepts and Techniques,
J. Kacprzyk and L. C. Jain, Eds. Morgan Kaufmann, 2006, vol. 54,
no. Second Edition.
[2] Yogita and D. Toshniwal, "A framework for outlier detection in evolving
data streams by weighting attributes in clustering," in Proceedings of
the 2nd International Conference on Communication Computing and
Security, India, 2012.
[3] S. Ramaswamy, R. Rastogi, and K. Shim, "Efficient algorithms for
mining outliers from large data sets," in Proceedings of the 2000
ACM SIGMOD international conference on Management of data, ser.
SIGMOD -00. New York, NY, USA: ACM, 2000, pp. 427-438.
[4] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, "Lof: identifying
density-based local outliers," in Proceedings of the 2000 ACM SIGMOD
international conference on Management of data, ser. SIGMOD -00.
New York, NY, USA: ACM, 2000, pp. 93-104.
[5] Z. He, X. Xu, and S. Deng, "Discovering cluster based local outliers,"
Pattern Recognition Letters, vol. 2003, pp. 9-10, 2003.
[6] M. Elahi, K. Li, W. Nisar, X. Lv, and H. Wang, "Efficient clusteringbased
outlier detection algorithm for dynamic data stream," in Proceedings
of the 2008 Fifth International Conference on Fuzzy Systems and
Knowledge Discovery - Volume 05, ser. FSKD -08. Washington, DC,
USA: IEEE Computer Society, 2008, pp. 298-304.
[7] F. Angiulli and F. Fassetti, "Detecting distance-based outliers in streams
of data," in Proceedings of the sixteenth ACM conference on Conference
on information and knowledge management, ser. CIKM -07. New York,
NY, USA: ACM, 2007, pp. 811-820.
[8] S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and
D. Gunopulos, "Online outlier detection in sensor data using nonparametric
models," in Proceedings of the 32nd international conference
on Very large data bases, ser. VLDB -06. VLDB Endowment, 2006,
pp. 187-198.
[9] M. S. Sadik and L. Gruenwald, DBOD-DS : Distance Based Outlier
Detection for Data Streams. Springer, 2011, vol. 6261, p. 122136.
[10] F. Angiulli, S. Basta, and C. Pizzuti, "Distance-based detection and
prediction of outliers," IEEE Trans. on Knowl. and Data Eng., vol. 18,
no. 2, pp. 145-160, Feb. 2006.
[11] L. Duan, L. Xu, Y. Liu, and J. Lee, "Cluster-based outlier detection,"
Annals of Operations Research, vol. 168, pp. 151-168, 2009.
[12] M. B. Al-Zoubi, "An effective clustering-based approach for outlier
detection," European Journal of Scientific Research, vol. 28, pp. 310-
316, 2009.
[13] J. Z. Huang, M. K. Ng, H. Rong, and Z. Li, "Automated variable
weighting in k-means type clustering," IEEE Trans. Pattern Anal. Mach.
Intell., vol. 27, no. 5, pp. 657-668, May 2005.
[14] A. Frank and A. Asuncion, "UCI machine learning repository," 2010.
(Online). Available: http://archive.ics.uci.edu/ml
[15] F. T. Liu, K. M. Ting, and Z.-H. Zhou, "Isolation-based anomaly
detection," ACM Trans. Knowl. Discov. Data, vol. 6, no. 1, pp. 3:1-
3:39, Mar. 2012.