Dataset Analysis Using Membership-Deviation Graph

Classification is one of the primary themes in computational biology. The accuracy of classification strongly depends on quality of a dataset, and we need some method to evaluate this quality. In this paper, we propose a new graphical analysis method using 'Membership-Deviation Graph (MDG)' for analyzing quality of a dataset. MDG represents degree of membership and deviations for instances of a class in the dataset. The result of MDG analysis is used for understanding specific feature and for selecting best feature for classification.




References:
[1] H. Liu, J. Li, L. Wong, "A Comparative Study on Feature Selection and
Classification Methods Using Gene Expression Profiles and Proteomic
Patterns", Gene Informatics 13, 2002, pp51-60.
[2] S. Doraisamy, S. Golzari, N.M. Norowi, M.N.B Sulaiman, N.I. Udzir,
"A Study on Feature selection and Classification Techniques for
Automatic Genre Classification of Traditional Malay Music", Proc. of
International Conference on Music Information Retrieval, 2008, pp331-
336.
[3] I. Guyon, A. Elisseeff, "An introduction to variable and feature
selection", J. Mach. Learn. Res. 3, 2003, pp.1157-1182.
[4] R. Gilad-Bachrac, A. Navot, N. Tishby, "Margin based feature
selection"theory and algorithms", Proceedings of the 21st International
Conference on Machine Learning, 2004.
[5] K.H. Quah, C. Quek, "MCES: a novel Monte Carlo evaluative selection
approach for objective feature selections", IEEE Trans. Neural Networks
18 (2), 2007.
[6] J. Dy, C.E. Brodley, "Feature selection for unsupervised learning", J.
Mach. Learn. Res. 5, 2005, pp845-889 2005.
[7] K. Kira, L.A. Rendell, "A Practical Approach to Feature Selection",
Proceedings of the Ninth International Conference on Machine Learning,
1992, pp249-256.
[8] W.S. Meisel, Computer-Oriented Approaches to Pattern Recognition,
Academic Press, New York, 1972.
[9] S. Piramuthu, "The Housdorff Distance Measure for Feature Selection in
Learning Applications", Proceedings of the 32nd Hawaii International
Conference on System Sciencespp1-6, 1999.
[10] J. Liang, S. Yang, A. Winstanley, "Invariant Optimal Feature Selection:
A Distance Discriminant and Feature Rranking Based Solution", The
journal of the pattern recognition, 2008, pp1429-1439.
[11] K. Kira, and L.A. Rendell, "The feature selection problem: Traditional
methods and a new algorithm", Proceedings of Ninth National
Conference on Artificial Intelligence, 1992, pp129-134.
[12] Y. Sun and D. Wu, "A RELIEF Based Feature Extraction Algorithm",
Proceedings of the 2008 SIAM International Conference on Data Mining,
2008, pp188-195.
[13] I. Kononenko, E. Simec, M. Robnik-Sikonja, "Overcoming the myopia
of induction learning algorithms with RELIEFF", Applied Intelligence
Vol7, 1, 1997, pp.39-55
[14] K. Nakai, Yeast Dataset, http://archive.ics.uci.edu/ml/datasets/Yeast.