Indian Journal of Science and Technology
Year: 2019, Volume: 12, Issue: 14, Pages: 1-11
Amjad Ali* and Qamruz Zaman
Department of Statistics, University of Peshawar, Pakistan; [email protected], [email protected]
*Author for correspondence
Department of Statistics, University of Peshawar, Pakistan.
Email: [email protected]
Objective: The main objective of study is to propose a new method of imputation for missing data. The study discuss misclassification rate, out of bag error for simulated and real data. Method: In this article, a new imputation method has been proposed for IN/OUT procedure of Random Forest (RF). The proposed method does not depend on the missing data mechanisms which are the principal advantages of this method. The method was evaluated and compared with non-missing data sets. Findings and Conclusion: It is concluded that the proposed method reduced the Out-of-Bag error and also the misclassification error rates in case of missing values using IN/OUT Procedure of RF and Conventional RF procedure at the different level of missing percentages. The proposed method gives interesting results in case of (5-15)% missing data and after that, the rest of the results are same therefore no need to compute the results for this percentage % of missing values. The most important is that this method was first time developed in the IN/OUT procedure of RF and conventional RF. Novelty/Motivation: Missing values a serious problem for all statistical problems. RF and IN/ OUT RF are not exception. Therefore a bootstrap based method to impute missing value in the IN/OUT RF was developed.
Keywords: Classification and Regression Tree (CART), Misclassification, Out of Bag (OoB), Random Forest (RF)
Subscribe now for latest articles and news.