Using XAI Techniques to Persuade Text Classifier Results: A Case Study of Covid-19 Tweets

S Hemdan Hamed; Hazem Elbakry; Haitham Elghareeb; Sara Elhishi

doi:10.17485/IJST/v15i30.397

Article

Using XAI Techniques to Persuade Text Classifier Results: A Case Study of Covid-19 Tweets

VIEWS 836
PDF 193

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v15i30.397

Year: 2022, Volume: 15, Issue: 30, Pages: 1484-1494

Original Article

Using XAI Techniques to Persuade Text Classifier Results: A Case Study of Covid-19 Tweets

S Hemdan Hamed^1*, Hazem Elbakry¹, Haitham Elghareeb¹, Sara Elhishi¹

¹Department of Information Systems, Faculty of Computers and Information, Mansoura University, Mansoura, Egypt

*Corresponding author
Email: [email protected]

Received Date:17 February 2022, Accepted Date:14 July 2022, Published Date:09 August 2022

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Background: To offer a transparent decision support system able of classifying tweets’ sentiment into positive, neutral, and negative sentiment and explains the prediction result by XAI techniques Methods: We started by data preprocessing phase. For data representation, we used TF-IDF, and we applied four machine-learning algorithms including Naive Bayes, random forest, logistic regression, and support vector machine, as well as four deep learning RNN, LSTM, GRU, and Bi-directional RNN. To raise model trust, we used LIME and SHAP to improve model explainability. Findings: The empirical findings show that the Logistic Regression model and SVM model using the TF-IDF feature extraction approach have the best performance when compared to the other models, with an average accuracy of 84% and 86% respectively. The data balancing step pushed the accuracy of the Random Forest model from 47% to 73%, other models slightly changed. The performance of deep learning models was better than traditional machine learning models, LSTM and GRU achieve approximately 78%, and Bi-directional RNN achieve 79% for dataset 2. Novelty and applications: we propose a highly accurate approach for SA which has been tested on two datasets. Also, to increase trust in model prediction, we explain the predicted sentiment.

Keywords: Explainable Artificial Intelligent (XAI); Sentiment Analysis; Covid19; Deep Learning; machine learning

References

Giachanou A, Crestani F. Like It or Not: A Survey of Twitter Sentiment Analysis Methods. J ACM Comput. Surv. 2016;49(2). Available from: https://doi.org/10.1145/2938640
Pang B, Lee L. Opinion Mining and Sentiment Analysis. . Foundations and Trends® in Information Retrieval. 2008;2. Available from: https://doi.org/10.1561/1500000011
Duncan B, Zhang Y. Neural networks for sentiment analysis on Twitter. 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC). 2015;2015:6–8.
Tang D, Qin B, Feng X, Liu T. Effective LSTMs for Target-Dependent Sentiment Classification2016 dec. In: The COLING 2016 Organizing Committee. 2016.
Cho K, Merrienboer BV, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. 2014.
Yang CC. Explainable Artificial Intelligence for Predictive Modeling in Healthcare. Journal of Healthcare Informatics Research. 2022;6(2):228–239. Available from: https://doi.org/10.1007/s41666-022-00114-1
Adadi A, Berrada M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) IEEE Access. 2018;6:52138–52160. Available from: https://doi.org/10.1109/ACCESS.2018.2870052
Bangyal WH, Qasim R, Rehman NU, Ahmad Z, Dar H, Rukhsar L, et al. Detection of Fake News Text Classification on COVID-19 Using Deep Learning Approaches. Computational and Mathematical Methods in Medicine. 2021;2021:1–14. Available from: https://doi.org/10.1109/ACCESS.2018.2870052
Cirqueira D, Almeida F, Cakir G, Jacob A, Lobato F, Bezbradica M, et al. Explainable Sentiment Analysis Application for Social Media Crisis Management in Retail. Proceedings of the 4th International Conference on Computer-Human Interaction Research and Applications. 2020. Available from: https://doi.org/10.5220/0010215303190328
Malhotra D, Saini P, Singh AK. Explaining Deep Learning-Based Classification of Textual Tweets. Data Analytics and Management. Singapore; Singapore. Springer. 2021.
Chakraborty K, Bhatia S, Bhattacharyya S, Platos J, Bag R, Hassanien AE. Sentiment Analysis of COVID-19 tweets by Deep Learning Classifiers—A study to show how popularity is affecting accuracy in social media. Applied Soft Computing. 2020;97:106754. Available from: https://doi.org/10.1016/j.asoc.2020.106754
Gite S, Khatavkar H, Kotecha K, Srivastava S, Maheshwari P, Pandey N. Explainable stock prices prediction from financial news articles using sentiment analysis. Peer Journal of Computer Science. 2021;7:e340. Available from: https://doi.org/10.7717/peerj-cs.340
Das S, Das D, Kolya AK. Sentiment classification with GST tweet data on LSTM based on polarity-popularity model. Sādhanā. 2020;45(1). Available from: https://doi.org/10.1007/s12046-020-01372-8
Wisesty UN, Rismala R, Munggana W, Purwarianti A. Comparative Study of Covid-19 Tweets Sentiment Classification Methods. 2021 9th International Conference on Information and Communication Technology. 2021;2021:3–5.
Reshi AA, Rustam F, Aljedaani W, Shafi S, Alhossan A, Alrabiah Z, et al. COVID-19 Vaccination-Related Sentiments Analysis: A Case Study Using Worldwide Twitter Dataset. Healthcare. 2022;10(3):411. Available from: https://doi.org/10.3390/healthcare10030411
Kumar V. Spatiotemporal sentiment variation analysis of geotagged COVID-19 tweets from India using a hybrid deep learning model. Scientific Reports. 2022;12(1). Available from: https://doi.org/10.1038/s41598-022-05974-6
Kaggel. Sentiment140 dataset with 1.6 million tweets. Available from: https://www.kaggle.com/datasets/kazanova/sentiment140
Lamsal R. Design and analysis of a large-scale COVID-19 tweets dataset. Applied Intelligence. 2021;51(5):2790–2804. Available from: https://doi.org/10.1007/s10489-020-02029-z
Dataport. Available from: https://ieee-dataport.org/access-covid-19-datasets
Padurariu C, Breaban ME. Dealing with Data Imbalance in Text Classification. Procedia Computer Science. 2019;159:736–745. Available from: https://doi.org/10.1016/j.procs.2019.09.229
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research. 2002;16:321–357. Available from: https://doi.org/10.1613/jair.953
Jiao Q, Zhang S. A Brief Survey of Word Embedding and Its Recent Development. 2021 IEEE 5th Advanced Information Technology Electronic and Automation Control Conference (IAEAC). 2021;2021:12–14. Available from: https://doi.org/10.1109/IAEAC50856.2021.9390956
Pennington J, Socher R, Manning C. Glove: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. 2014.
Ribeiro MT, Singh S, Guestrin C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. (pp. 1135-1179) Association for Computational Linguistics. 2016.
Arrieta AB, Díaz-Rodríguez N, Ser JD, Bennetot A, Tabik S, Barbado A, et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion. 2020;58:82–115. Available from: https://doi.org/10.1016/j.inffus.2019.12.012
Lundberg SM, Lee SI, Guyon I, Luxburg UV, Bengio S, Wallach H, et al. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems. 2017;30:4765–74.

Copyright

© 2022 Hamed et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Published By Indian Society for Education and Environment (iSee)