• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: 29, Pages: 2244-2251

Original Article

An Efficient Outlier Detection Using Isolation Forest Based on Robust Scaling and Principal Component Analysis for the Prediction of Anxiety Disorder

Received Date:19 March 2023, Accepted Date:03 July 2023, Published Date:05 August 2023


Objectives: To develop a model for the prediction of anxiety disorder. Presence of outliers may affect the model performance. Hence, here the main consideration is the removal of outliers. Methods: This study proposed an outlier detection method IF-RSPCA in which outliers are handled in two phases. In the first phase, reduce the impact of outliers by using Inter Quartile Range (IQR), then dimensionality reduction is done using principal component analysis. Training time can be reduced by Principal Component Analysis. In the second phase of IF-RSPCA, check the outliers for still exist and removed. Findings: For the performance evaluation two datasets are generated; one using conventional isolation forest and the other using IFRSPCA. Performance of both datasets are tested using K Neighbors, Decision Tree and Naïve Bayes classifiers, highest accuracies are obtained as 96.10 %,93.80% and 90.2%, respectively. It is found that the dataset generated using proposed method performed well. As compared to previous works on survey datasets, the proposed model is capable of identifying the anxiety, more accurately. Novelty: Conventional isolation forest algorithm is modified using Inter Quartile Range there by improve the performance measures. It helps to removes all the outliers completely. The previous models for the predictions of anxiety disorder are developed without performing the outlier detection; hence chances of misclassification exist. In several previous works, isolation forest combined with clustering, but it is not capable of removing all the outliers. In these cases, outlier handling is performed in a single phase only.

Keywords: Principal Component Analysis; Isolation Forest; Robust Scaling; K Nearest Neighbors; Decision tree


  1. Arif M, Basri A, Melibari G, Sindi T, Alghamdi N, Altalhi N, et al. Classification of Anxiety Disorders using Machine Learning Methods: A Literature Review Insights of. Biomedical Research. 2020(1). Available from: https://doi.org/10.36959/584/455
  2. Lento R, Boland. Clinical Handbook of Anxiety Disorders. (pp. 203-220) Springer. 2019.
  3. Anthonyj, Rosellini S, Liu, Gracen, Anderson S, Sbi, et al. Developing algorithms to predict adult onset internalizing disorders: An ensemble learning approach. Journal of Psychiatric Research. 2020;121:189–196. Available from: https://doi.org/10.1016/j.jpsychires.2019.12.006
  4. Jothi N, Husain W, Rashid NA. Predicting generalized anxiety disorder among women using Shapley value. Journal of Infection and Public Health. 2021;14(1):103–108. Available from: https:// doi.org/10.1016/j.jiph.2020.02.042
  5. Sau A, Bhakta I. Screening of anxiety and depression among the seafarers using machine learning technology. Informatics in Medicine Unlocked. 2019;16:100149. Available from: https:// doi.org/10.1016/j.imu.2018.12.004
  6. Burke TA, Jacobucci R, Ammerman BA, Alloy LB, Diamond G. Using machine learning to classify suicide attempt history among youth in medical care settings. Journal of Affective Disorders. 2020;268:206–214. Available from: https://doi.org/10.1016/j.jad.2020.02.048
  7. Karczmarek P, Pedrycz AKW, Al E. K-Means-based isolation forest . 2020. Available from: https://doi.org/10.1016/j.knosys.2020.105659
  8. Wang H, Jiang W, Deng X, Geng J. A new method for fault detection of aero-engine based on isolation forest. Measurement. 2021;185:110064. Available from: https://doi.org/10.1016/j.measurement.2021.110064
  9. Karczmarek P, Kiersztyn A, Pedrycz W. Dariusz Czerwinski, Fuzzy C-Means-based Isolation Forest. Applied Soft Computing. 2021;106. Available from: https//doi.org/10.1016/j.asoc.2021.107354
  10. Loo NL, Chiew YS, Tan CP, Mat-Nor MB, Ralib AM. A machine learning approach to assess magnitude of asynchrony breathing. Biomedical Signal Processing and Control. 2021;66:102505. Available from: https://doi.org/10.1016/j.bspc.2021.102505
  11. Khan MAH, Thomson B, Debnath R, Motayed A, Rao MV. Nanowire-Based Sensor Array for Detection of Cross-Sensitive Gases Using PCA and Machine Learning Algorithms. IEEE Sensors Journal. 2020;(11). Available from: https://doi.org/10.1109/JSEN.2020.2972542
  12. Huang Y, Jin W, Yu Z, Li B. A robust anomaly detection algorithm based on principal component analysis. Intelligent Data Analysis. 2021;25(2):249–263. Available from: https://doi.org/10.3233/IDA-195054
  13. Heigl M, Anand KA, Urmann A, Fiala D, Schramm M, Hable R. On the Improvement of the Isolation Forest Algorithm for Outlier Detection with Streaming Data. Electronics. 2021;10(13):1534. Available from: https://doi.org/10.3390/electronics10131534
  14. Luan S, Gu Z, Freidovich LB, Jiang L, Zhao Q. Out-of-Distribution Detection for Deep Neural Networks With Isolation Forest and Local Outlier Factor. IEEE Access. 2021;9:132980–132989. Available from: https://doi.org/10.1109/ACCESS.2021.3108451


© 2023 Prajesha & Veni. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)


Subscribe now for latest articles and news.