Indian Journal of Science and Technology
DOI: 10.17485/IJST/v16i43.1864
Year: 2023, Volume: 16, Issue: 43, Pages: 3862-3874
Original Article
Robindro Singh Khumukcham1*, Devi Mayanglambam1, Boby Clinton Urikhimbam1, Nazrul Hoque1
1Department of Computer Science, Manipur University, Canchipur, Imphal, 795003, Manipur, India
*Corresponding Author
Email: [email protected]
Received Date:10 August 2023, Accepted Date:16 October 2023, Published Date:14 November 2023
Objectives: The main objective of the research work is to estimate the missing values of a dataset that contains both numeric and categorical type attributes and features. Developing a missing value imputation method to handle mixed-type data is an important problem for machine learning researchers. Methods: We developed a method called MVI-DR to estimate the missing values of a mixed-type dataset. The proposed MVI-DR method incorporates linear regression (LiR) and Decision Trees (DT) to compute the missing values for numeric and categorical data, respectively. The proposed MVI-DR method is validated using five classifiers viz., Logistic Regression (LoR), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), DT, and Random Forest (RF) on 9 mixed-type datasets taken from UCI and Kaggle repositories. Findings: From the experimental results, we observed that the proposed MVI-DR method effectively estimates the missing values for both numeric and categorical data types. Especially on the Car, Lung Cancer, Thyroid, Melb, and Penguins datasets, the proposed method gives 75.7% accuracy, whereas the traditional method gives 75.6% accuracy using the LR model. Similarly, on the Lung Cancer dataset, MVI-DR yields 66.3%, 57.2%, 61.1%, 51%, 60.7%, and traditional one gives 62.2%, 53.2%, 55.8%, 47.4%, 58.6%, using LR, k-NN, SVM, DT and RF classifiers, respectively. In addition to accuracy, the proposed method yields better results on most of the datasets in terms of MCC. Moreover, we found that the proposed method performed better on high-dimensional mixed-type datasets. Novelty: A new missing value imputation method called MVI-DR is developed. The method can handle both numeric and categorical data types. The MVI-DR method is evaluated in terms of Accuracy, F1-score and MCC.
Keywords: Numeric, Categorical, Imputation, MVIDR, Machine Learning, Mixed type
© 2023 Khumukcham et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)
Subscribe now for latest articles and news.