• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: 8, Pages: 547-556

Original Article

Genetic Algorithm Based Over-Sampling with DNN in Classifying the Imbalanced Data Distribution Problem

Received Date:26 April 2022, Accepted Date:13 January 2023, Published Date:27 February 2023

Abstract

Objective: Data imbalance exists in many real-life applications. In the imbalanced datasets, the minority class data creates a wrong inference during the classification that leads to more misclassification. More research has been done in the past to solve this issue, but as of now there is no global working solution found to do efficient classification. After analyzing various existing literatures, it is proposed to minimize the misclassification through genetic based oversampling and deep neural network (DNN) classifier. Method: In the proposed oversampling method synthetic samples are generated based on genetic algorithm. Initial populations for the genetic algorithm are generated using Gaussian weight initialization technique and the fittest individual from the population are selected by Euclidean distance for further processing to generate synthetic data in double the minority class size and the dataset is classified with the DNN. Findings: The performance of the oversampled training data with DNN Classifier is compared with C4.5 and Support Vector Machine (SVM) classifiers and found that the DNN classifier outperforms the other two classifiers. The data generated using SMOTE and ADASYN are considered for comparison. It is found that the proposed approach outperforms the other approaches. It is also proved from the experiment that misclassification is reduced and the proposed method is statistically significant and is comparatively better. Novelty: Initial population generation by Gaussian weight initialization, the fittest sample selection by Euclidean distance measure, synthetic samples with double the minority class size and DNN for classification to reduce the misclassification is novelty in this work.

Keywords: Genetic algorithm; Gauss weight initialization; SMOTE; ADASYN; Imbalanced data; Classification

References

  1. Wang L, Han M, Li X, Zhang N, Cheng H. Review of Classification Methods on Unbalanced Data Sets. IEEE Access. 2021;9:64606–64628. Available from: https://doi.org/10.1109/ACCESS.2021.3074243
  2. Li Q, Zhao C, He X, Chen K, Wang R. The Impact of Partial Balance of Imbalanced Dataset on Classification Performance. Electronics. 2022;11(9):1322. Available from: https:// doi.org/10.3390/electronics11091322
  3. Gnip P, Vokorokos L, Drotár P. Selective oversampling approach for strongly imbalanced data. PeerJ Computer Science. 2021;7:e604. Available from: https://doi.org/10.7717/peerj-cs.604
  4. Rahnamayan S, Tizhoosh HR, Salama MMA. A novel population initialization method for accelerating evolutionary algorithms. Computers & Mathematics with Applications. 2007;53(10):1605–1614. Available from: https://doi.org/10.1016/j.camwa.2006.07.013
  5. Ghazikhani A, Yazdi HS, Monsefi R. Class imbalance handling using wrapper-based random oversampling. 20th Iranian Conference on Electrical Engineering (ICEE2012). 2012;2012:1–6. Available from: https://doi.org/10.1109/IranianCEE.2012.6292428
  6. Suh S, Lukowicz P, Lee YO. Discriminative feature generation for classification of imbalanced data. Pattern Recognition. 2022;122:108302. Available from: https://doi.org/10.1016/j.patcog.2021.108302
  7. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 2002;16:321–357. Available from: https://doi.org/10.1613/jair.953
  8. He H, Bai Y, Garcia EA, Li S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. IEEE International Joint Conference on Neural Networks. 2008;p. 1322–1328. Available from: https://doi.org/10.1109/IJCNN.2008.4633969
  9. Chawla NV, C4. Investigating the Effect of Sampling Method, Probabilistic Estimate, and Decision Tree Structure. Proc Intl Conf Mach Learn Work Learn from Imbalanced Data Sets II. 2003;8. Available from: https://www3.nd.edu/~dial/publications/chawla2003c45.pdf

Copyright

© 2023 Karthikeyan & Kathirvalavakumar. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.