Indian Journal of Science and Technology
Year: 2023, Volume: 16, Issue: 8, Pages: 547-556
S Karthikeyan1*, T Kathirvalavakumar2
1Research Scholar, Research centre in Computer Science, V.H.N.Senthikumara Nadar College, Virudhunagar, Tamil Nadu, India
2Associate Professor, Research centre in Computer Science, V.H.N.Senthikumara Nadar College, Virudhunagar, Tamil Nadu, India
Email: [email protected]
Received Date:26 April 2022, Accepted Date:13 January 2023, Published Date:27 February 2023
Objective: Data imbalance exists in many real-life applications. In the imbalanced datasets, the minority class data creates a wrong inference during the classification that leads to more misclassification. More research has been done in the past to solve this issue, but as of now there is no global working solution found to do efficient classification. After analyzing various existing literatures, it is proposed to minimize the misclassification through genetic based oversampling and deep neural network (DNN) classifier. Method: In the proposed oversampling method synthetic samples are generated based on genetic algorithm. Initial populations for the genetic algorithm are generated using Gaussian weight initialization technique and the fittest individual from the population are selected by Euclidean distance for further processing to generate synthetic data in double the minority class size and the dataset is classified with the DNN. Findings: The performance of the oversampled training data with DNN Classifier is compared with C4.5 and Support Vector Machine (SVM) classifiers and found that the DNN classifier outperforms the other two classifiers. The data generated using SMOTE and ADASYN are considered for comparison. It is found that the proposed approach outperforms the other approaches. It is also proved from the experiment that misclassification is reduced and the proposed method is statistically significant and is comparatively better. Novelty: Initial population generation by Gaussian weight initialization, the fittest sample selection by Euclidean distance measure, synthetic samples with double the minority class size and DNN for classification to reduce the misclassification is novelty in this work.
Keywords: Genetic algorithm; Gauss weight initialization; SMOTE; ADASYN; Imbalanced data; Classification
© 2023 Karthikeyan & Kathirvalavakumar. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)
Subscribe now for latest articles and news.