Improving Classification Accuracy based on Random Forest Model through Weighted Sampling for Noisy Data with Linear Decision Boundary

S  Sasikala; S  Bharathidason   and C  Jothi Venkateswaran

doi:10.17485/ijst/2015/v8iS8/71714

Article

Improving Classification Accuracy based on Random Forest Model through Weighted Sampling for Noisy Data with Linear Decision Boundary

VIEWS 1033
PDF 223

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2015/v8iS8/71714

Year: 2015, Volume: 8, Issue: Supplementary 8, Pages: 1-6

Original Article

Improving Classification Accuracy based on Random Forest Model through Weighted Sampling for Noisy Data with Linear Decision Boundary

S. Sasikala¹ , S. Bharathidason^2* and C. Jothi Venkateswaran³

¹Department of Computer Science, IDE, University of Madras, Chennai - 600005, India; [email protected]
² Department of Computer Science, Loyola College, Chennai, India; [email protected]
³Department of Computer Science, Presidency College, Chennai, India; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Background: Random forest algorithms tend to use a simple random sampling of observations in building their decision trees. The random selection has the chance for noisy, outlier and non informative data to take place during the construction of trees. This leads to inappropriate and poor ensemble classification decision. This paper aims to optimize, the sample selection through probability proportional to size sampling (weighted sampling) in which the noisy, outlier and non informative data points are down weighted to improve the classification accuracy of the model. Methods: The weights of each data pointis determined in two aspects,finding each data pointinfluence on the modelthrough Leave-One-Out method using a single classification tree and measuring the deviance residual of each data point using logistic regression model, these are combined as the final weight. Results: The proposed Finest Random Forest (FRF) performs consistently better than the conventional Random Forest (RF) in terms of classification accuracy. Conclusion: The classification accuracy is improved when random forest is composed with probability proportional to size sampling (weighted sampling) for noisy data with linear decision boundary.
Keywords: Classification Accuracy, Decision Trees, Noisy Data, Outlier, Random Forest, Weighted Sampling