• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: 41, Pages: 3704-3713

Original Article

Hate Speech Detection based on Word Embedding and Linguistic Features

Received Date:22 August 2023, Accepted Date:03 October 2023, Published Date:12 November 2023


Objectives: To develop an improved hate speech detection method based on word embedding and linguistic features. Methods: Many machine-learning classifiers like Logistic Regression (LR), Gaussian Naive Bayes (GNB), Random Forest (RF), K-Nearest Neighbor (KNN) and Linear Support Vector Classifier (SVC) are trained on linguistic data for identifying hated speech. For this research two datasets has been used with the size of 24783 tweets and 6977 tweets for Tweet hate speech detection dataset and Hasoc19 dataset respectively. We have taken the size of training and testing dataset is 67/33 for both the dataset, in which size of training dataset is 67 and size of testing dataset is 33. Findings: On Tweet hate speech detection dataset we target the highest accuracy 0.90 and highest precision, recall and f1-score like 0.87, 0.85 and 0.90 respectively for label 0 and 0.98, 0.98 and 0.93 respectively for label 1 and 0.86, 0.85 and 0.74 for class 2 after applying random forest classifier. On Hasoc2019 dataset we achieve the highest accuracy 0.99 and highest precision, recall and f1-score values like 1.00, 0.99 and 1.00 for class 0 and 1.0, 0.99 and 0.99 for class 1 after applying Random Forest classifier with linguistic features TF-IDF word embedding technique. Novelty: Twenty linguistic features with term frequency-inverse document frequency (TF-IDF) word embedding technique make this research unique. Twenty linguistic characteristics have been chosen for detecting the despised information based on three groups of attributes which is complexity attributes, stylometric attributes and psycho-linguistic attributes have been chosen.

Keywords: Machine Learning Classifiers, Linguistic Features, Accuracy, TF­IDF, Random Forest Classifier


  1. Ahmed U, Lin JCW. Deep Explainable Hate Speech Active Learning on Social-Media Data. IEEE Transactions on Computational Social Systems. 2022;p. 1–11. Available from: https://doi.org/10.1109/TCSS.2022.3165136
  2. Roy PK, Tripathy AK, Das TK, Gao XZ. A Framework for Hate Speech Detection Using Deep Convolutional Neural Network. IEEE Access. 2020;8:204951–204962. Available from: https://doi.org/10.1109/ACCESS.2020.3037073
  3. Zhou Y, Yang Y, Liu H, Liu X, Savage N. Deep Learning Based Fusion Approach for Hate Speech Detection. IEEE Access. 2020;8:128923–128929. Available from: https://doi.org/10.1109/ACCESS.2020.3009244
  4. Khan S, Kamal A, Fazil M, Alshara MA, Sejwal VK, Alotaibi RM, et al. HCovBi-Caps: Hate Speech Detection Using Convolutional and Bi-Directional Gated Recurrent Unit With Capsule Network. IEEE Access. 2022;10:7881–7894. Available from: https://doi.org/10.1109/ACCESS.2022.3143799
  5. Rodriguez-Sanchez F, Carrillo-De-Albornoz J, Plaza L. Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data. IEEE Access. 2020;8:219563–219576. Available from: https://doi.org/10.1109/ACCESS.2020.3042604
  6. Plaza-Del-Arco FM, Molina-Gonzalez MD, Urena-Lopez LA, Martin-Valdivia MT. A Multi-Task Learning Approach to Hate Speech Detection Leveraging Sentiment Analysis. IEEE Access. 2021;9:112478–112489. Available from: https://doi.org/10.1109/ACCESS.2021.3103697
  7. Ilie VI, Truica CO, Apostol ES, Paschke A. Context-Aware Misinformation Detection: A Benchmark of Deep Learning Architectures Using Word Embeddings. IEEE Access. 2021;9:162122–162146. Available from: https://doi.org/10.1109/ACCESS.2021.3132502
  8. Lee E, Rustam F, Washington PB, Barakaz FE, Aljedaani W, Ashraf I. Racism Detection by Analyzing Differential Opinions Through Sentiment Analysis of Tweets Using Stacked Ensemble GCR-NN Model. IEEE Access. 2022;10:9717–9728. Available from: https://doi.org/10.1109/ACCESS.2022.3144266
  9. Alatawi HS, Alhothali AM, Moria KM. Detecting White Supremacist Hate Speech Using Domain Specific Word Embedding With Deep Learning and BERT. IEEE Access. 2021;9:106363–106374. Available from: https://doi.org/10.1109/ACCESS.2021.3100435
  10. Soto CP, Nunes GMS, Gomes JGRC, Nedjah N. Application-specific word embeddings for hate and offensive language detection. Multimedia Tools and Applications. 2022;81(19):27111–27136. Available from: https://doi.org/10.1007/s11042-021-11880-2
  11. Kovács G, Alonso P, Saini R. Challenges of Hate Speech Detection in Social Media. SN Computer Science. 2021;2(2):95. Available from: https://doi.org/10.1007/s42979-021-00457-3


© 2023 Jain & Sharma. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)


Subscribe now for latest articles and news.