Improving Smell Prediction: Developing an Improved Model with Supervised Learning Techniques

Neha Kumari and Satwinder Singh

doi:10.17485/ijst/2017/v10i24/115002

Article

Improving Smell Prediction: Developing an Improved Model with Supervised Learning Techniques

VIEWS 806
PDF 894

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2017/v10i24/115002

Year: 2017, Volume: 10, Issue: 24, Pages: 1-11

Original Article

Improving Smell Prediction: Developing an Improved Model with Supervised Learning Techniques

Neha Kumari and Satwinder Singh

Department of Computer Science and Technology, Central University of Punjab, City Campus, Manasa Road, Bathinda – 151001, Punjab, India; [email protected], [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: To build a model for the prediction of the code smells using the supervised learning techniques. The motive to build code smell prediction model is to propose a model with less false positive code smells. Also, the proposed model is cross validated using 10-fold cross validation Methods/Statistical Analysis: To build a smell prediction model, two code smell detection tools are used i.e. IPlasma and PMD. The metrics are extracted using understand and IPlasma. To achieve the above mentioned objective, two experiments are performed. One is using the code smells of the PMD and the other one is using the smells of the IPLasma. The smells of the PMD are associated with the metrics that are extracted using Understand. Then, the model is trained using the different supervised learning algorithms that are called classifiers i.e. Random Forest, Naïve Bayes and Kstar. Findings: In this research work, two experiments are performed. One is using the code smells of the PMD and other one is using the code smells of the IPlasma. From the results obtained i.e. with the code smells of the PMD, it is concluded that Random Forest predicts small number of false positive and false negative code smells as the precision and Recall of the Random Forest in each dataset is larger than the other two classifier’s. Moreover, the ROC value of Random Forest is higher in some datasets and in some datasets the ROC value of KStar is higher. The results obtained i.e. with the code smells of the IPlasma, it is concluded that again Random Forest predict code smells more correctly than the other two classifiers and give less number of false positive and false negative code smells. Moreover, there is an exception, there is one dataset in which Random Forest and KStar both shows 100% accuracy i.e., Precison and Recall both are equal to 1, which shows that both classifiers predicts no false positive and false negative code smells. Moreover, the ROC value of Random Forest is higher than the other two classifiers, even in some datasets it is equals to 1. Using this it is concluded that Random Forest gives the best code smell predicting model. Application/Improvements: The results of this experiments shows only one case where the false positive and false negative code smells are not predicted by the models. This can improve such that on apply each and every dataset it gives zero false positive code smells.

Keywords: Automatic Tools, Bad Code Smells, Code Smell, Code Smell Prediction, Supervised Learning Techniques