Indian Journal of Science and Technology
Year: 2018, Volume: 11, Issue: 1, Pages: 1-11
Tlhalitshi Volition Montshiwa*, Ntebo Moroke and Elias Munapo
Department of Statistics and Operations Research, North West University Mafikeng Campus, South Africa; [email protected], [email protected], [email protected]
*Author for correspondence
Tlhalitshi Volition Montshiwa,
Department of Statistics and Operations Research, North West University Mafikeng Campus, South Africa; [email protected]
Objectives: This study investigated the efficiency of Multiple Imputation (MI) and Maximum Likelihood (ML) methods for estimating missing values. The study was set to use the findings to make recommendations for future studies about the impact of missing data imputation on the accuracy of Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Methods: The completedset (with no missing values) used in this study was collected in 2010/11 through the Income and Expenditure Survey (IES) and had 25328 observations. Missing data were generated by randomly deleting 10%, 20%, 30%, 40% and 50% of the values from the complete dataset. The missing values in each of the five datasets were imputed using MI and ML methods. Subsequently, absolute error values of AIC and BIC from multiple regression analysis were computed for each dataset. The study then compared the absolute errors for each missing value imputation method. Findings: The findings of the study revealed that AIC and BIC are more accurate when missing values are estimated by the Full Information Maximum Likelihood (FIML) of the ML algorithm, provided 10% of the data are missing. For all datasets, AIC and BIC were least accurate when missing values were imputed by Expectation Maximisation (EM) of the ML algorithm. The findings also showed that AIC and BIC are more accurate when the rate of MISSINGNESS gets large provided missing values were estimated using either the Fully Conditional Specification (FCS) or Markov Chain Monte Carlo (MCMC), MI algorithms. Application: When the rate of MISSINGNESS is small (at most 10%), FIML should be used to handle missing data if AIC and BIC are going to be used. Also both FCS and MCMC should be considered over EM algorithms when the rate of MISSINGNESS is high (at least 40% missing).
Keywords: Maximum Likelihood Imputation, Multiple Imputation, AIC, BIC
Subscribe now for latest articles and news.