An Empirical Study to Analyse The Effect of Bagging and Feature Subspacing on The Performance of A Custom Ensemble Algorithm for Predicting Drug Protein Interactions

Harshita Bhargava; Amita Sharma; Prashanth Suravajhala

doi:10.17485/IJST/v17i10.3202

Article

An Empirical Study to Analyse The Effect of Bagging and Feature Subspacing on The Performance of A Custom Ensemble Algorithm for Predicting Drug Protein Interactions

VIEWS 248
PDF 1128

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v17i10.3202

Year: 2024, Volume: 17, Issue: 10, Pages: 911-916

Original Article

An Empirical Study to Analyse The Effect of Bagging and Feature Subspacing on The Performance of A Custom Ensemble Algorithm for Predicting Drug Protein Interactions

Harshita Bhargava^1*, Amita Sharma¹, Prashanth Suravajhala^2,3

¹Department of Computer Science & IT, IIS (deemed to be University), Jaipur, Rajasthan, India
²Amrita School of Biotechnology, Amrita University, Clappana, Kollam, 690525, Kerala, India
³Bioclues.org, India

*Corresponding Author
Email: [email protected]

Received Date:12 December 2023, Accepted Date:30 January 2024, Published Date:27 February 2024

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: The objective of this study is to analyse the effect of bagging and feature subspacing on the performance of a custom ensemble of decision tree classifiers for predicting drug protein interactions. Methods: In our present work we have designed a custom ensemble algorithm with decision trees as the base learner. We analysed the effect of bagging negative samples and feature subspacing on the performance of the custom ensemble in terms of AUCROC and AUPR. The Enzyme dataset from the Yamanishi dataset composed of 445 drugs and 664 proteins was used for the experiments. Findings: It was observed that the effect of bagging negative samples was significant as compared to feature supspacing in terms of AUPR metric. Now since AUPR is a metric that remains unaffected by the presence of negative samples hence the increase in AUPR by increasing the negative to positive ratio clearly indicated that the negative samples do contain the positives which are unknown and are yet to be verified. Novelty: The results give a strong indication that that feature subspacing has no considerable impact on the AUCROC metric performance of the custom ensemble while AUPR metric increases as the negative to positive ratio increases. The results give a foundation to the fact that, finding reliable negative samples from the entire set of negative drug protein pairs can further enhance the performance of the machine learning classifiers.

Keywords: Decision tree classifier, Ensemble classifier, Drug discovery, Bagging, Drug repurposing

References

Nagamine N, Sakakibara Y. Statistical prediction of protein–chemical interactions based on chemical structure and mass spectrometry data. Bioinformatics. 2007;23(15):2004–2012. Available from: https://doi.org/10.1093/bioinformatics/btm266
Chen R, Liu X, Jin S, Lin JS, Liu J. Machine Learning for Drug-Target Interaction Prediction. Molecules. 2018;23(9):1–15. Available from: https://doi.org/10.3390/molecules23092208
Faulon JLL, Misra M, Martin S, Sale K, Sapra R. Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor. Bioinformatics. 2008;24(2):225–233. Available from: https://doi.org/10.1093/bioinformatics/btm580
Jacob L, Vert JP. Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics. 2008;24(19):2149–2156. Available from: https://doi.org/10.1093/bioinformatics/btn409
Tabei Y, Pauwels E, Stoven V, Takemoto K, Yamanishi Y. Identification of chemogenomic features from drug–target interaction networks using interpretable classifiers. Bioinformatics. 2012;28(18):i487–i494. Available from: https://doi.org/10.1093/bioinformatics/bts412
Rodriguez JJ, Kuncheva LI, Alonso CJ. Rotation Forest: A New Classifier Ensemble Method. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2006;28(10):1619–1630. Available from: https://doi.org/10.1109/TPAMI.2006.211
Wang L, You ZH, Chen X, Yan X, Liu G, Zhang W. RFDT: A Rotation Forest-based Predictor for Predicting Drug-Target Interactions Using Drug Structure and Protein Sequence Information. Current Protein & Peptide Science. 2018;19(5):445–454. Available from: https://doi.org/10.2174/1389203718666161114111656
Yu H, Chen J, Xu X, Li Y, Zhao H, Fang Y, et al. A Systematic Prediction of Multiple Drug-Target Interactions from Chemical, Genomic, and Pharmacological Data. PLoS ONE. 2012;7(5):1–14. Available from: https://doi.org/10.1371/journal.pone.0037608
Hansen LK, Salamon P. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1990;12(10):993–1001. Available from: https://doi.org/10.1109/34.58871
Rayhan F, Ahmed S, Farid DM, Dehzangi A, Shatabda S. CFSBoost: Cumulative feature subspace boosting for drug-target interaction prediction. Journal of Theoretical Biology. 2019;464:1–8. Available from: https://doi.org/10.1016/j.jtbi.2018.12.024
Ezzat A, Wu M, Li XL, Kwoh CK. Drug-target interaction prediction via class imbalance-aware ensemble learning. BMC Bioinformatics. 2016;17(S19):267–276. Available from: https://doi.org/10.1186/s12859-016-1377-y
Ezzat A, Wu M, Li XL, Kwoh CK. Drug-target interaction prediction using ensemble learning and dimensionality reduction. Methods. 2017;129:81–88. Available from: https://doi.org/10.1016/j.ymeth.2017.05.016
Sharma A, Rani R. BE-DTI’: Ensemble framework for drug target interaction prediction using dimensionality reduction and active learning. Computer Methods and Programs in Biomedicine. 2018;165:151–162. Available from: https://doi.org/10.1016/j.cmpb.2018.08.011
Najm M, Azencott CAA, Playe B, Stoven V. Drug Target Identification with Machine Learning: How to Choose Negative Examples. International Journal of Molecular Sciences. 2021;22(10):1–15. Available from: https://doi.org/10.3390/ijms22105118
Sharifabad MM, Sheikhpour R, Gharaghani S. Drug-target interaction prediction using reliable negative samples and effective feature selection methods. Journal of Pharmacological and Toxicological Methods. 2022;116:107191. Available from: https://doi.org/10.1016/j.vascn.2022.107191

Copyright

© 2024 Bhargava et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)