Indian Journal of Science and Technology
Year: 2015, Volume: 8, Issue: Supplementary 8, Pages: 1-6
S. Sasikala1 , S. Bharathidason2* and C. Jothi Venkateswaran3
1 Department of Computer Science, IDE, University of Madras, Chennai - 600005, India; [email protected]
2 Department of Computer Science, Loyola College, Chennai, India; [email protected]
3 Department of Computer Science, Presidency College, Chennai, India; [email protected]
Background: Random forest algorithms tend to use a simple random sampling of observations in building their decision trees. The random selection has the chance for noisy, outlier and non informative data to take place during the construction of trees. This leads to inappropriate and poor ensemble classification decision. This paper aims to optimize, the sample selection through probability proportional to size sampling (weighted sampling) in which the noisy, outlier and non informative data points are down weighted to improve the classification accuracy of the model. Methods: The weights of each data pointis determined in two aspects,finding each data pointinfluence on the modelthrough Leave-One-Out method using a single classification tree and measuring the deviance residual of each data point using logistic regression model, these are combined as the final weight. Results: The proposed Finest Random Forest (FRF) performs consistently better than the conventional Random Forest (RF) in terms of classification accuracy. Conclusion: The classification accuracy is improved when random forest is composed with probability proportional to size sampling (weighted sampling) for noisy data with linear decision boundary.
Keywords: Classification Accuracy, Decision Trees, Noisy Data, Outlier, Random Forest, Weighted Sampling
Subscribe now for latest articles and news.