Detection of Hate Speech Text in Afan Oromo Social Media using Machine Learning Approach

Naol Bakala Defersha; Kula Kekeba Tune

doi:10.17485/IJST/v14i31.1019

Article

Detection of Hate Speech Text in Afan Oromo Social Media using Machine Learning Approach

VIEWS 4660
PDF 2152

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v14i31.1019

Year: 2021, Volume: 14, Issue: 31, Pages: 2567-2578

Original Article

Detection of Hate Speech Text in Afan Oromo Social Media using Machine Learning Approach

Naol Bakala Defersha^1*, Kula Kekeba Tune²

¹Core Member, Center of Excellence for HPC and Big Data Analytics; Ph.D. Student, Assistant Professor, Department of Software Engineering, College of Electrical and Mechanical Engineering, Addis Ababa Science and Technology, Addis Ababa, 16417, Ethiopia
²Head, Center of Excellence for HPC and Big Data Analytics; Assistant Professor, Department of Software Engineering, College of Electrical and Mechanical Engineering, Addis Ababa Science and Technology University, Adidas Ababa, 16417, Ethiopia

*Corresponding Author
Email: [email protected]

Received Date:04 June 2021, Accepted Date:16 August 2021, Published Date:22 September 2021

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: This study aims to develop a hate speech detection model for Afan Oromo’s texts on social networks like Facebook and Twitter using a machine learning algorithm. Methods: we collected comments and posts from social media like Facebook and Twitter pages of BBC Afan Oromo, OBN Afan Oromo, Fana Afan Oromo Program, Politicians, Activists, Religious Men, and Oromia Communication Bureau using Face pager tool. The collected data was labelled using Afan Oromo hate speech evaluation system we developed. Text preprocessing tasks applied on data to remove special characters, stop-words,HTML Tags, extra whitespaces, numbers, lemmatization. The n-gram and TFIDF was applied for feature extraction task to obtain benchmark Afan Oromo hate speech detection dataset. Researchers split dataset into train and test set. Finally, we applied Support Vector Classifier, Multinomial NB, Linear Support Vector Classifier, Logistic Regression decision tree and Random Forest Classifier on 67% of trained data. The performance of proposed model also evaluated using F-score. We also test the performance of developed model by loading test set into it. Findings: Hate speech on social media violates the welfare of Ethnic groups and citizens for living together. Many researches have been doing for English, Amharic, and other Languages to detect hate content from social media. This study has focused on developing a prototype for Afan Oromo hate speech detection model using machine learning algorithms and evaluate its performance in which we found Linear Support Vector Classifier scored highest f1-score value is 64%. Novelty: Afan Oromo hate speech detection framework proposed and successfully implemented to develop Afan Oromo hate speech detection model. We wrote python script that overcome problems typos in Afan Oromo in addition to designing python scripts that recognized apostrophe “ ’ ” as important letter for Afan Oromo word formation. Yet, no researchers have used combination of n-gram and TF-IDF for feature extraction. In this study, the n-gram and TF-IDF used for feature extraction approach to build model that detect Afan Oromo hate speech on Social media.

Keywords: Afan Oromo; Decision tree; Facebook; Hate Speech; Linear Support Vector Classifier; Machine Learning; MultinomialNB; Social Media; Support Vector Classifier; Decision Tree and Random Forest Classifier

References

Ibrohim MO, Budi I. Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter. 2019. doi: 10.18653/v1/w19-3506
Alfawareh IAM, Alfawareh M, Hammo B, Hijazi N. Intelligent detection of hate speech in Arabic social network: A machine learning approach. Journal of Information. Science. 2020. doi: 10.1177/0165551520917651
Chakraborty P, Seddiqui MH. Threat and Abusive Language Detection on Social Media in Bengali Language. 1st International Conference Advances Science Engineering Robotics Technology. 2019, ICASERT 2019. 2019;2019:1–6. doi: 10.1109/ICASERT.2019.8934609
Macavaney S, Yao HR, Yang E, Russell K, Goharian N, Frieder O. Hate speech detection: Challenges and solutions. PLoS One. 2019;14(8):1–16. doi: 10.1371/journal.pone.0221152
George C. Hate Speech Law and Policy. International Encyclopedia Digitial Communication Society. 2014;p. 1–10. doi: 10.1002/9781118767771.wbiedcs139
Zampieri M. Detecting Hate Speech in Social Media. Available from: https://www.aclweb.org/anthology/R17-1062/
Mossie Z, Wang JH. Social Network Hate Speech Detection for Amharic Language. 2018. doi: 10.5121/csit.2018.80604
Sreelakshmi K, Premjith B, Soman KP. Detection of Hate Speech Text in Hindi-English Code-mixed Data. Procedia Comput Science. 2019;171:737–744. doi: 10.1016/j.procs.2020.04.080
Febriana T, Budiarto A. Twitter Dataset for Hate Speech and Cyberbullying Detection in Indonesian Language. Proc. 2019 International. Conference. Information Management and Technology. ICIMTech 2019. 2019;1:379–382. doi: 10.1109/ICIMTech.2019.8843722
Kuyu SJ. Developing an Automated Machine Learning Based Sentiment Analysis for Afaan Oromoo. 2021.
Tesfaye SG, Tune KK. Automated Amharic Hate speech Posts and Comments Detection Model using Recurrent Neural Network. 2020. doi: 10.21203/rs.3.rs-114533/v1
Available from: https://ssrn.com/abstract=3770521
Sajjad M, Zulifqar F, Khan MUG, Azeem M. Hate Speech Detection using Fusion Approach. ICAEM 2019 - Proc. 2019;p. 251–255. doi: 10.1109/ICAEM.2019.8853762
Sazany E, Budi I. Deep Learning-Based Implementation of Hate Speech Identification on Texts in Indonesian: Preliminary Study. Proc. ICAITI 2018 - 1st Information Management and Technologyl. Innovation. Towar. A New Paradig. Des. Assist. Technol. Smart Home Care. 2018;p. 114–117. doi: 10.1109/ICAITI.2018.8686725.
Sigurbergsson GI, Derczynski L. Offensive language and hate speech detection for danish. Lr. 2020 - 12th International. Conference. Language. Resource. Evaluation. Confernce. Procedia. 2020;p. 3498–3508. Available from: https://arxiv.org/abs/1908.04531
Sutejo TL, Lestari DP. Indonesia Hate Speech Detection using Deep Learning. International. Conference. Asian Language. Process. 2018;p. 39–43. Available from: https://ieeexplore.ieee.org/document/8629154
Gomez R, Gibert J, Gomez L, Karatzas D. 2019. Available from: https://arxiv.org/abs/1910.03814
Mhamdi C, Al-Emran M, Salloum SA. Text mining and analytics: A case study from news channels posts on Facebook. Stud. Computer. Intelligent. 2018;740:399–415. Available from: https://link.springer.com/chapter/10.1007/978-3-319-67056-0_19
Oriola O. Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets. 2020.

Copyright

© 2021 Defersha & Tune. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)