• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2015, Volume: 8, Issue: Supplementary 8, Pages: 1-6

Original Article

Improving Classification Accuracy based on Random Forest Model through Weighted Sampling for Noisy Data with Linear Decision Boundary

Abstract

Background: Random forest algorithms tend to use a simple random sampling of observations in building their decision trees. The random selection has the chance for noisy, outlier and non informative data to take place during the construction of trees. This leads to inappropriate and poor ensemble classification decision. This paper aims to optimize, the sample selection through probability proportional to size sampling (weighted sampling) in which the noisy, outlier and non informative data points are down weighted to improve the classification accuracy of the model. Methods: The weights of each data pointis determined in two aspects,finding each data pointinfluence on the modelthrough Leave-One-Out method using a single classification tree and measuring the deviance residual of each data point using logistic regression model, these are combined as the final weight. Results: The proposed Finest Random Forest (FRF) performs consistently better than the conventional Random Forest (RF) in terms of classification accuracy. Conclusion: The classification accuracy is improved when random forest is composed with probability proportional to size sampling (weighted sampling) for noisy data with linear decision boundary.
Keywords: Classification Accuracy, Decision Trees, Noisy Data, Outlier, Random Forest, Weighted Sampling

DON'T MISS OUT!

Subscribe now for latest articles and news.