Neural Network Imputation in Complex Survey Design

Missing data yields many analysis challenges. In case of complex survey design, in addition to dealing with missing data, researchers need to account for the sampling design to achieve useful inferences. Methods for incorporating sampling weights in neural network imputation were investigated to account for complex survey designs. An estimate of variance to account for the imputation uncertainty as well as the sampling design using neural networks will be provided. A simulation study was conducted to compare estimation results based on complete case analysis, multiple imputation using a Markov Chain Monte Carlo, and neural network imputation. Furthermore, a public-use dataset was used as an example to illustrate neural networks imputation under a complex survey design


Authors:



References:
[1] Paul D. Allison (1999). "Multiple imputation for missing data: A
cautionary tale". Available:
http://www.ssc.upenn.edu/~allison/MultInt99.pdf
[2] S. Amer, V. Lesser, and R. Burton, "Neural network imputation, a new
fashion or a good tool: Linear neural network imputation," Proceedings
of the Survey Research Section, American Statistical Association
Meetings, 2003.
[3] D.A. Binder, W. SUN, "Frequency valid multiple imputation for surveys
with a complex design. Proceedings of the Section on Survey Research
Methods", American Statistical Association,, pp. 281-286, 1996.
[4] C.M. Bishop, Neural networks for pattern recognition. Oxford:
Clarendon Press, 1995.
[5] K.R.W. Brewer, and R.W. Mellor, "The effect of sample structure on
analytical surveys," Australian Journal of Statistics, 15, pp. 145-152,
1973.
[6] E.M. Burns, "Multiple imputation in a complex sample survey,"
Proceedings of the Survey Research Methods Section of the American
Statistical Association, pp. 233-238, 1989.
[7] G. Casella, and R.L. Burger, Statistical inference. California: Duxbury
press, 1990.
[8] R.L. Chambers, and C.J. Skinner (eds.) Analysis of survey data. Chester:
Wiley, 2003.
[9] W.G. Cochran, Sampling techniques, (3rd Edition). New York: Wiley,
1977.
[10] L.M. Collins, J. L. Schafer, and C-M. Kam, "A comparison of inclusive
and restrictive strategies in modern missing data procedures",
Psychological Methods, 6 (4), pp. 330-351, 2001.
[11] I. P. Fellegi, and D. Holt. "A systematic approach to automatic edit and
imputation," Journal of the American Statistical Association, 71, pp. 17-
35, 1976.
[12] A.E. Gelman, J.B.Carlin, H.S. Stern, and D.B. Rubin. Bayesian data
analysis, London: Chapman & Hall, 1995.
[13] A.E. Gelman and D.B. Rubin. "Inference from iterative smulation using
multiple sequences," Statistical Science, 7, pp. 457-472, 1992.
[14] S. Geman, and D. Geman. "Stochastic relaxation, Gibbs distributions,
and the Bayesian restoration of images," IEEE Transactions on Pattern
Analysis and Machine Intelligence, 6, pp. 721-741, 1984.
[15] C.J. Geyer. "Practical Markov Chain Monte Carlo," Statistical Science,
7(4), 1992.
[16] M.H. Hansen, W.N. Hurwitz, and W.G. Madow. Sampling survey
methods and theory, Vols. I and II. New York: Wiley, 1953.
[17] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical
learning: Data mining, inference, and prediction. Springer, New York,
2001.
[18] N.J. Horton and S.R. Lipsitz. "Multiple imputation in practice:
Comparisons of software packages for regression models with missing
variables," The American Statistician, 5(3), 2001.
[19] R.A. Jacobs, M.I. Jordan, S.J. Nolman, and G.E. Hinton. "Adaptive
mixtures of local experts," Neural Computation, 3, pp. 79-87(1991)..
[20] L. Kish. Survey sampling, New York: Wiley, 1965.
[21] Kish, L. "The Hundred years- wars of survey sampling," Statistics in
Transition, 2, pp. 813-830, 1995.
[22] H. Lee, E. Rancourt, and C.E. Särndal. "Variance estimation from
survey fata under single imputation," Survey Nonresponse, R.M. Groves,
D.A. Dillman, J.L. Eltinge, and R.J.A. Little, (Eds). New York: John
Wiley and Sons, 2002.
[23] Little, Roderick J.A. and Rubin, Donald B. Statistical analysis with
missing data, New Jersey: John Wiley & Sons, 2002.
[24] S. L. Lohr. Sampling: Design and analysis, Duxbury Press, 1999.
[25] P.C. Mahalanobis. "Recent experiments in statistical sampling in the
Indian Statistical Institute," Journal of the Royal Statistical Societ,, 109,
pp. 325-370, 1946.
[26] D.A. Marker, D.R. Judkins, and M. Winglee. "Large-scale imputation
for complex surveys." R.M. Groves, D.A.Dillman, J.L. Eltinge, and
R.J.A Little, (Eds.) Survey Nonresponse, New York: John Wiley and
Sons, 2002.
[27] National Center for Health Statistics. Data file documentation, National
Health Interview Survey, 2001 (machine readable file and
documentation). National Center for Health Statistics, Centers for
Disease Control and Prevention, Hyattsville, Maryland, 2002.
[28] E. Rancourt, C.-E. Särndal, and H. Lee. "Estimation of the variance in
presence of nearest neighbor imputation," Proceedings of the Section on
Survey Research Methods, American Statistical Association, pp. 888-
893, 1994.
[29] I. Rivals and L. Personnaz. "Construction of confidence intervals for
neural networks based on least squares estimation," Neural Networks,
13, 463-484 (2000)..
[30] D.B. Rubin. "Formalizing subjective notions about the effect of nonrespondents
in sample surveys," Journal of the American Statistical
Association, 77, pp. 538-543, 1977.
[31] C.-E. Särndal, B. Swensson, and J. Wretman. Model assisted survey
sampling, Springer-Verlag, 1991.
[32] C.-E. Särndal. "Methods for estimating the precision of survey estimates
when imputation has been used," Survey Methodology, 18, pp. 241-265,
1992.
[33] J.L. Schafer. Analysis of incomplete multivariate data. London:
Chapman and Hall, 1997.
[34] J. Schimert, J.L. Schafer, T.M. Hesterberg, C. Fraley, and D.B.
Clarkson. Analyzing data with missing values in S-Plus. Seattle:
Insightful Corp, 2000.
[35] A.F.M. Smith and G.O. Roberts. "Bayesian computation via the Gibbs
sampler and related Markov Chain Monte Carlo methods," Journal of
the Royal Statistical Society, Series B, 5(1), 1992.
[36] Vartivarian, S.L. and Little, R.J. (2003). "Weighting adjustments for unit
nonresponse with multiple outcome variables," The University of
Michigan Department of Biostatistics (Working Paper Series: Working
Paper 21.) Available: http://www.bepress.com/umichbiostat/paper21
[37] R.S. Woodruff. "A simple method for approximating the variance of a
complicated estimate," Journal of the American Statistical Association,
66, pp. 411-414, 1971.