Optimizing the Prediction of Bagging and Boosting

B  V  Sumana   and T  Santhanam

doi:10.17485/ijst/2015/v8i35/78449

Article

Optimizing the Prediction of Bagging and Boosting

VIEWS 2111
PDF 317

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2015/v8i35/78449

Year: 2015, Volume: 8, Issue: 35, Pages: 1-13

Original Article

Optimizing the Prediction of Bagging and Boosting

B. V. Sumana^1* and T. Santhanam²

¹Department of Computer Science, Vijaya College, Jayanagar, Bangalore - 560011, Karnataka, India; [email protected]
²Department of Computer Applications, DG Vaishnav College, Chennai - 600106, Tamil Nadu, India; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Background/Objectives: Since more than a decade, Ensemble methods like Bagging and Boosting have drawn great attention by the researchers aiming to improve the prediction accuracy over single classifiers Despite Some recent studies have noticed that Bagging and Boosting does not always improve the accuracy, it enhances the accuracy only if the classifier is unstable classifier. To overcome this problem, a Hybrid Ensemble Model with two phases of preprocessing is proposed in this paper and evaluated using 9 classifiers on 3 benchmark data sets of UCI Repository. Methods: In the first phase of preprocessing feature selection is performed using CFS to select the attributes highly correlated to the class and in the second phase K-means clustering algorithm is applied to remove the incorrectly classified instances. Finally, the resultant instances from the previous stages are trained with Bagging and Boosting ensembles to build the final Hybrid Ensemble classifier Model (HECM) using 10 fold cross validation. The result was evaluated using confusion matrix and performance measures like accuracy, kappa, mean absolute error and time to build the model. Findings: Results proved that proposed model is more efficient than the existing models and showed improved accuracy for both stable and unstable classifier ranging from 2% to 30.14% over traditional ensemble model depending upon the complexity of the algorithm.
Keywords: Bagging, Boosting, Classification, Correlation Based Feature Selection (CFS), Hybrid, K-Means