Network Intrusion Detection Design Using Feature Selection of Soft Computing Paradigms

The network traffic data provided for the design of intrusion detection always are large with ineffective information and enclose limited and ambiguous information about users- activities. We study the problems and propose a two phases approach in our intrusion detection design. In the first phase, we develop a correlation-based feature selection algorithm to remove the worthless information from the original high dimensional database. Next, we design an intrusion detection method to solve the problems of uncertainty caused by limited and ambiguous information. In the experiments, we choose six UCI databases and DARPA KDD99 intrusion detection data set as our evaluation tools. Empirical studies indicate that our feature selection algorithm is capable of reducing the size of data set. Our intrusion detection method achieves a better performance than those of participating intrusion detectors.




References:
[1] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling,
Numerical Recipes in C: The Art of Scientific Computing, Cambridge
University Press, 1988.
[2] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function
Algorithms, Plenum Press, New York, 1981.
[3] J. C. Dunn, "A Fuzzy Relative of the ISODATA Process and Its Use in
Detecting Compact Well-Separated Clusters," Journal of Cybernetics,
vol. 3, pp. 32-57, 1973.
[4] A. P. Dempster, "A Generalization of Bayesian Inference," Journal of
the Royal Statistical Society, Series B, vol. 30, pp. 205-247, 1968.
[5] G. Shafer, A Mathematical Theory of Evidence, Princeton, University
Press, Princeton, NJ, 1976.
[6] E. Fix and J. L. Hodges, "Discriminatory Analysis: Nonparametric
Discrimination: Consistency Properties," Report Number 4, Project
Number 21-49-004, USAF School of Aviation Medicine, Randolph
Field, Texas, 1951.
[7] KDD99 archive: The Fifth International Conference on Knowledge
Discovery and Data Mining.
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
[8] C. L. Blake and C. J. Merz, UCI Repository of Machine Learning
Databases, 1998.
[9] M. Hall, Correlation Based Feature Selection for Machine Learning,
Doctoral Dissertation, The University of Waikato, Department of
Computer Science, 1999.
[10] L. Yu and H. Liu, "Feature Selection for High-Dimensional Data: A
Fast Correlation-Based Filter Solution," in Proceedings of The
Twentieth International Conference on Machine Leaning, pp. 856-863,
Washington, D.C., August, 2003.
[11] T. M. Mitchell, Machine Learning, McGraw-Hill, New York, 1997.
[12] J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan
Kaufmann, 1993.
[13] M. Keller, M. R. Gray, and J. A. Givens Jr., "A Fuzzy k-Nearest
Neighbor Algorithms," Transactions on Systems, Man and Cybernetics,
vol. SMC-15(4), pp. 580-585, 1985.
[14] T. Denoeux, "A k-Nearest Neighbor Classification Rule Based on
Dempster-Shafer Theory," IEEE Transactions on Systems, Man and
Cybernetics, vol. 25, no. 5, pp. 804-813, May 1995.
[15] I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature
Selection," Journal of Machine Learning Research, vol. 3, pp. 1157-
1182, 2003.
[16] J. M. Booker, M. C. Anderson, M. A. Meyer, "The Role of Expert
Knowledge in Uncertainty Quantification (Are We Adding More
Uncertainty or More Understanding?)," in Seventh Army Conference on
Applied Statistics, pp. 155-161, 2001.
[17] W. L Oberkampf, J. C. Helton, C. A. Jos lyn, S. F. Wojtkiewicz, and S.
Ferson, "Challenge Problems: Uncertainty in System Response Given
Uncertain Parameters," Reliability Engineering and System Safety, vol.
85 pp. 11-19, 2004.
[18] K. Jones and R. S. Sielken, Computer System Intrusion Detection: A
Survey, Technical Report, Computer University of Virginia, 2000.
[19] G. John, R. Kohavi, and K. Pfleger, "Irrelevant Features and the Subset
Selection Problem," in Proceedings ML-94, pp. 121-129, Morgan
Kaufmann, 1994.
[20] K. Kira and L. A. Rendell, "The Feature Selection Problem: Traditional
Methods and a New Algorithm," in Proceedings AAAI-92, pp. 129-134,
MIT Press, 1992.
[21] H. Almuallim and T. G. Dietterich, "Learning with Many Irrelevant
Features," in Proceedings AAAI-91, pp. 547-551, MIT Press, 1991.
[22] G. Qu, S. Hariri, and M. Yousif, "A New Dependency and Correlation
Analysis for Features," IEEE Transactions on Knowledge and Data
Engineering, vol. 17, no. 9, pp. 1199-1207, September 2005.
[23] H. G. Kayac─▒k, A. N. Zincir-Heywood, and M. I. Heywood, "Selecting
Features for Intrusion Detection: A Feature Relevance Analysis on
KDD 99 Intrusion Detection Datasets," in Third Annual Conference on
Privacy, Security and Trust, St. Andrews, New Brunswick, Canada,
October 2005.
[24] J. R Quinlan, "Induction of Decision Trees," Machine Learning, vol. 1,
pp. 81-106, 1986.
[25] G. Stein, B. Chen, A. S. Wu, and K. A. Hua, "Decision Tree Classifier
For Network Intrusion Detection With GA-based Feature Selection," in
Proceedings of the 43rd ACM Southeast Conference, Kennesaw, GA,
March 2005.
[26] S. Mukkamala and A. H. Sung, "Feature Selection for Intrusion
Detection Using Neural Networks and Support Vector Machines",
Journal of the Transportation Research Board of the National
Academics, Transportation Research Record No 1822, pp. 33-39, 2003.
[27] J. Biesiada and W. Duch, "Feature Selection for High-Dimensional
Data: A Kolmogorov-Smirnov Correlation-Based Filter Solution," in
Proceedings of the 4th International Conference on Computer
Recognition Systems, 2005.
[28] S. A. Dudani, "The Distance-Weighted k-NN Rule," IEEE
Transactions on Systems, Man and Cybernetics, vol. 6, no. 4, pp. 325-
327, 1976.
[29] R. P. Lippmann, D. J. Fried, I. Graf, J. W. Haines, K. R. Kendall, D.
McClung, D. Weber, S. E. Webster, D. Wyschogrod, R. K.
Cunningham, and M. A. Zissman, "Evaluating Intrusion Detection
Systems: the 1998 DARPA Off-Line Intrusion Detection Evaluation,"
in Proceedings of the 2000 DARPA Information Survivability
Conference and Exposition, vol. 2, IEEE Press, January 2000.
[30] M. Sabhnani and G. Serpen, "Why Machine Learning Algorithms Fail
in Misuse Detection on KDD Intrusion Detection Data Set," Intelligent
Data Analysis, vol. 8, no. 4, pp. 403-415, 2004.