Abstract: This study demonstrates an alternative stochastic imputation approach for large datasets when preferred commercial packages struggle to iterate due to numerical problems. A large country conflict dataset motivates the search to impute missing values well over a common threshold of 20% missingness. The methodology capitalizes on correlation while using model residuals to provide the uncertainty in estimating unknown values. Examination of the methodology provides insight toward choosing linear or nonlinear modeling terms. Static tolerances common in most packages are replaced with tailorable tolerances that exploit residuals to fit each data element. The methodology evaluation includes observing computation time, model fit, and the comparison of known values to replaced values created through imputation. Overall, the country conflict dataset illustrates promise with modeling first-order interactions, while presenting a need for further refinement that mimics predictive mean matching.
Abstract: Assessing several individuals intensively over time
yields intensive longitudinal data (ILD). Even though ILD provide
rich information, they also bring other data analytic challenges. One
of these is the increased occurrence of missingness with increased
study length, possibly under non-ignorable missingness scenarios.
Multiple imputation (MI) handles missing data by creating several
imputed data sets, and pooling the estimation results across imputed
data sets to yield final estimates for inferential purposes. In this
article, we introduce dynr.mi(), a function in the R package,
Dynamic Modeling in R (dynr). The package dynr provides a suite
of fast and accessible functions for estimating and visualizing the
results from fitting linear and nonlinear dynamic systems models in
discrete as well as continuous time. By integrating the estimation
functions in dynr and the MI procedures available from the R
package, Multivariate Imputation by Chained Equations (MICE), the
dynr.mi() routine is designed to handle possibly non-ignorable
missingness in the dependent variables and/or covariates in a
user-specified dynamic systems model via MI, with convergence
diagnostic check. We utilized dynr.mi() to examine, in the context
of a vector autoregressive model, the relationships among individuals’
ambulatory physiological measures, and self-report affect valence
and arousal. The results from MI were compared to those from
listwise deletion of entries with missingness in the covariates.
When we determined the number of iterations based on the
convergence diagnostics available from dynr.mi(), differences in
the statistical significance of the covariate parameters were observed
between the listwise deletion and MI approaches. These results
underscore the importance of considering diagnostic information in
the implementation of MI procedures.
Abstract: We explore the relationship between internal migration
and poverty in Tunisia. We present a methodology combining
potential outcomes approach with multiple imputation to highlight the
effect of internal migration on poverty states. We find that probability
of being poor decreases when leaving the poorest regions (the west
areas) to the richer regions (greater Tunis and the east regions).
Abstract: Analyzing DNA microarray data sets is a great
challenge, which faces the bioinformaticians due to the complication
of using statistical and machine learning techniques. The challenge
will be doubled if the microarray data sets contain missing data,
which happens regularly because these techniques cannot deal with
missing data. One of the most important data analysis process on
the microarray data set is feature selection. This process finds the
most important genes that affect certain disease. In this paper, we
introduce a technique for imputing the missing data in microarray
data sets while performing feature selection.
Abstract: Missing data yields many analysis challenges. In case of complex survey design, in addition to dealing with missing data, researchers need to account for the sampling design to achieve useful inferences. Methods for incorporating sampling weights in neural network imputation were investigated to account for complex survey designs. An estimate of variance to account for the imputation uncertainty as well as the sampling design using neural networks will be provided. A simulation study was conducted to compare estimation results based on complete case analysis, multiple imputation using a Markov Chain Monte Carlo, and neural network imputation. Furthermore, a public-use dataset was used as an example to illustrate neural networks imputation under a complex survey design
Abstract: The occurrence of missing values in database is a serious problem for Data Mining tasks, responsible for degrading data quality and accuracy of analyses. In this context, the area has shown a lack of standardization for experiments to treat missing values, introducing difficulties to the evaluation process among different researches due to the absence in the use of common parameters. This paper proposes a testbed intended to facilitate the experiments implementation and provide unbiased parameters using available datasets and suited performance metrics in order to optimize the evaluation and comparison between the state of art missing values treatments.
Abstract: Missing data is a persistent problem in almost all
areas of empirical research. The missing data must be treated very
carefully, as data plays a fundamental role in every analysis.
Improper treatment can distort the analysis or generate biased results.
In this paper, we compare and contrast various imputation techniques
on missing data sets and make an empirical evaluation of these
methods so as to construct quality software models. Our empirical
study is based on NASA-s two public dataset. KC4 and KC1. The
actual data sets of 125 cases and 2107 cases respectively, without
any missing values were considered. The data set is used to create
Missing at Random (MAR) data Listwise Deletion(LD), Mean
Substitution(MS), Interpolation, Regression with an error term and
Expectation-Maximization (EM) approaches were used to compare
the effects of the various techniques.
Abstract: There are many situations where input feature vectors are incomplete and methods to tackle the problem have been studied for a long time. A commonly used procedure is to replace each missing value with an imputation. This paper presents a method to perform categorical missing data imputation from numerical and categorical variables. The imputations are based on Simpson-s fuzzy min-max neural networks where the input variables for learning and classification are just numerical. The proposed method extends the input to categorical variables by introducing new fuzzy sets, a new operation and a new architecture. The procedure is tested and compared with others using opinion poll data.
Abstract: In this paper, we discuss the egalitarianism solution (ES) and center-of-gravity of the imputation-set value (CIV) for bicooperative games, which can be seen as the extensions of the solutions for traditional games given by Dutta and Ray [1] and Driessen and Funaki [2]. Furthermore, axiomatic systems for the given values are proposed. Finally, a numerical example is offered to illustrate the player ES and CTV.