A Proposed Hybrid Approach for Feature Selection in Text Document Categorization

Text document categorization involves large amount of data or features. The high dimensionality of features is a troublesome and can affect the performance of the classification. Therefore, feature selection is strongly considered as one of the crucial part in text document categorization. Selecting the best features to represent documents can reduce the dimensionality of feature space hence increase the performance. There were many approaches has been implemented by various researchers to overcome this problem. This paper proposed a novel hybrid approach for feature selection in text document categorization based on Ant Colony Optimization (ACO) and Information Gain (IG). We also presented state-of-the-art algorithms by several other researchers.




References:
[1] F. Sebastiani, "Machine learning automated text categorization", ACM
Computing Surveys, vol. 34, no. 1, pp. 1 - 47, March 2002.
[2] A. Tasci and T. Gungor, "An evaluation of existing and new feature
selection metrics in text categorization", International Symposium on
Computer and Information Science, pp. 1-6, Oct. 2008.
[3] Y. Yang and J. O. Pedersen, "A Comparative study on feature selection
in text categorization", Proceeding of 14th International Conference on
Machine Learning, San Francisco, 1997, pp. 412-420.
[4] E. Gabrilovich and S. Markovitch, "Text Categorization with many
redundant features: using aggressive feature selection to make SVM
competetive with C4.5", Proceeding of 21st International Conference on
Machine Learning, Canada, 2004.
[5] Sheen and Rajesh, "Network intrusion detection using feature selection
and decision tree classifier", IEEE Region 10 Conference, Hyderabad,
pp. 1-4, Nov. 2008.
[6] Q. Li, J.H. Li, G.S. Li, and S.H. Li, "A rough set-based hybrid feature
selection method for topic-specific text filtering", Proceedings of the
Third International Conf. on Machine Learning and Cybernetics,
Shanghai, August 2004, pp. 1464-1468.
[7] S. Wang, Y. Wei, and D. Li, "A hybrid method of feature selection for
Chinese text sentiment classification", Fourth International Conf. on
Fuzzy Systems and Knowledge Discovery, 2007.
[8] C.S. Yang, L.Y. Chuang, J.C. Li, and C.H. Yang, "Information gain with
chaotic genetic algorithm for gene selection and classification problem",
IEEE International Conference on Systems, Man and Cybernetics, pp.
1128-1133, Oct. 2008.
[9] M., Dorigo and T. Stutzle, Ant Colony Optimization, MIT press, 2004,
pp.25-26.
[10] H.R. Kanan, K. Faez and M. Hosseinzadeh, "Face recognition system
using ant colony optimization-based selected features", IEEE
Symposium on Computational Intelligence in Security and Defense
Applications, pp. 57-62, Apr. 2007.
[11] C.K. Zhang and H. Hu, "Feature selection using the hybrid of ant colony
optimization and mutual information for the forecaster", Proceedings of
the Fourth International Conf. on Machine Learning and Cybernetic,
Guangzhou, August 2005, pp. 1728-1732.
[12] J. Zhou, R. Ng, and X. Li, "Ant colony optimization and mutual
information hybrid algorithms for feature subset selection in equipment
fault diagnosis", 10th International Conf. on Control, Automation,
Robotics and Vision, Hanoi, Vietnam, December 2008.
[13] M. He, "Feature selection based on ant colony optimization and rough
set theory", International Symposium on Computer Science and
Computational Technology, pp. 247-250. Dec. 2008.
[14] M.E. Basiri and S. Nemati, "A novel hybrid ACO-GA algorithm for text
feature selection", IEEE Congress on Evolutionary Computation, pp.
2561-2568, 2009.
[15] E. Elbeltagi, T. Hegazy and D. Grierson, "Comparison among five
evolutionary-based optimization algorithms", Advanced Engineering
Informatics, vol. 19, no. 1, pp. 43-53, 2005.
[16] M. Dorigo and C. Blum, "Ant colony optimization theory: A survey",
Theoretical Computer Science, pp. 243-278, 2005.
[17] M.H. Aghdam, N.G. Aghaee and M.E. Basiri, "Application of ant
colony optimization for feature selection in text categorization", IEEE
Congress on Evolutionary Computation, pp. 2867-2873, June 2008.
[18] C. Lee and G.G. Lee, "MMR-based feature selection for text
categorization", Proceedings of the Annual Conf. of Human Language
Technology conference / North American chapter of the Association for
Computational Linguistic, May 2004.
[19] R. Jensen, "Combining rough and fuzzy sets for feature selection",
Ph.D. Dissertation, School of Information, Edinburgh Univ., 2005.
[20] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data.
Kluwer Academic Publishing, Dordrecht, 1991.
[21] A.M. Mesleh and G. Kanaan, "Support vector machine text classification
system: Using ant colony optimization based feature subset selection",
Int. Conf. on Computer Engineering and Systems, pp. 143-148, Nov.
2008.
[22] M. Sadeghzadeh and M. Teshnehlab, "Correlation based feature
selection using ant colony optimization", World Academy of Science,
Engineering and Technology 64, pp. 497-502, 2010.
[23] A. Al-Ani, "Ant colony optimization for feature subset selection", World
Academy of Science, Engineering and Technology 4, pp. 35-38, 2005.
[24] M. Deriche, "Feature selection using ant colony optimization",
International Multi-Conference on Systems, Signals and Devices, pp. 1-
4, March 2009.
[25] L. Wen, Q. Yin, and P. Guo, "Ant colony optimization algorithm for
feature selection and classification of multispectral remote sensing
image", IEEE Int. Geosciences and Remote Sensing Symposium, pp.
923-926, July 2008.
[26] W. Xiong and C. Wang, "A hybrid improved and colony optimization
and random forest feature selection method for microarray data", Fifth
International Joint Conference on INC, IMS and IDC, pp. 559-563,
2009.