Categorical Missing Data Imputation Using Fuzzy Neural Networks with Numerical and Categorical Inputs

There are many situations where input feature vectors are incomplete and methods to tackle the problem have been studied for a long time. A commonly used procedure is to replace each missing value with an imputation. This paper presents a method to perform categorical missing data imputation from numerical and categorical variables. The imputations are based on Simpson-s fuzzy min-max neural networks where the input variables for learning and classification are just numerical. The proposed method extends the input to categorical variables by introducing new fuzzy sets, a new operation and a new architecture. The procedure is tested and compared with others using opinion poll data.





References:
[1] J. L. Schafer, Analysis of Incomplete Data, Chapman & Hall, London,1997.
[2] P. Allison, Missing Data, Sage Publications, Inc, 2002.
[3] R. J. Little, and D. B. Rubin, Statistical Analysis with Missing Data, 2nd
ed. , John Wiley and Sons, New York, 2002.
[4] A. P. Dempster, and D. B. Rubin, "Incomplete data in sample surveys"
in W. G. Madow, I. Olkin, and D. B. Rubin, Eds., Sample Surveys, Vol.
II: Theory and Annotated Bibliography, New York, Academic Press,1983.
[5] S. Mitra, S. K. Pal, and P. Mitra, "Data mining in soft computing framework: a survey", IEEE Transactions on Neural Networks, vol. 13,
issue 1, pp. 3-14, Jan. 2002.
[6] P. K. Simpson, "Fuzzy min-max neural networks- Part 1: classification",
IEEE Transactions on Neural Networks, vol. 3, Sep. 1992, pp. 776-786.
[7] P. K. Simpson, "Fuzzy min-max neural networks- Part 2: clustering",
IEEE Transactions on Fuzzy Systems, vol. 1, pp. 32-45, Feb. 1993.
[8] D. R. Cox, Principles of Statistical Inference, Cambridge University
Press, 2006.
[9] J. Carde├▒osa, and P. Rey-del-Castillo, "A fuzzy control approach for
vote estimation", Proceedings of the Fifth International Conference on
Information Technologies and Applications, vol. 1. Varna, Bulgaria, June 2007.
[10] M. Abdella, and T. Marwala, "The Use of Genetic Algorithms and
Neural Networks to Approximate Missing Data in Database", ICCC
2005, IEEE 3rd International Conference on Computational
Cybernetics, pp. 207-212, 2005.
[11] F. V. Nelwamondo, S. Mohamed, and T. Marwala, "Missing Data: A
Comparison of Neural Network and Expectation Maximization
Techniques", Current Science, vol. 93, no. 11, pp. 1514-1521, Dec. 2007.
[12] P. Lingras, M. Zhong, and S. Sharma, "Evolutionary Regression and
Neural Imputations of Missing Values", Soft Computing Applications in
Industry, Studies in Fuzziness and Soft Computing Series, vol. 226,
Springer, Berlin/Heidelberg, pp. 151-163, 2008.
[13] B. Gabrys, and A. Bargiela, "General Fuzzy Min-Max Neural Network
for Clustering and Classification", IEEE Transactions on Neural
Networks, vol. 11, pp. 769-783, May 2000.
[14] B. Gabrys, "Neuro-Fuzzy Approach to Processing Inputs with Missing
Values in Pattern Recognition Problems". International Journal of Approximate Reasoning, vol. 30, pp. 149-179, September 2002.
[15] M. J. Greenacre, Theory and Applications of Correspondence Analysis,
Academic Press, London, 1984
[16] T. J. Santner, and D. E. Duffy, "A Note on A. Albert and J. A.
Anderson-s Conditions for the Existence of Maximum Likelihood Estimates in Logistic Regression Models", Biometrika, vol. 73, pp. 755-
758, 1986.