• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2020, Volume: 13, Issue: 40, Pages: 4202-4215

Review Article

Bag-of-Phrases (BoPh) and sentiment analysis of Arabic text in Twitter

Received Date:30 July 2020, Accepted Date:08 November 2020, Published Date:20 November 2020


Background/Objectives: Sentiment analysis plays main role in various text mining problems. Although, the Arabic text mining is important especially in the field of sentiment analysis, there is a paucity of research in it, especially, when it plays an important role in different issues in Arabic countries. Arabic language has many dialects that people use to express their feelings in social media. The objective of this study is to perform an experiment that follow the subjective opinion from the text. Subjective Analysis is one way that we can implement to improve the accuracy of the sentiment results in such texts in some dialects, that hide various meanings behind the words such as Saudi dialect. Methods/Statistical analysis: In this study, we manually annotated more than 8,000 tweets to have training and testing data sets with positive or negative words and phrases. Then we proposed a “Bag of Phrases” methodology to analyze the sentiments in the texts, which helped to improve the performance of sentiment analysis. Since using bag of words method is not enough in many cases, we applied a Naive Bayes algorithm to test our method. Findings: The results show that the accuracy of having True positive or True negative is about 84% comparing by using manual annotation process. The accuracy is calculated after taking into consideration the margin of error due to the manual annotation step and subjective interpretation of the texts by the annotators. Novelty/Applications: The novelty of the study is having more accurate training data set comparing with the other works in Saudi dialect for Arabic text, and proposing the BoPh concept.

Keywords: Sentiment analysis; Saudi dialect; Arabic text analysis; Twitter data analysis; bag of phrases


  1. El-Beltagy SR, Ali A. Open issues in the sentiment analysis of Arabic social media: A case study. 9th International Conference on Innovations in Information Technology (IIT). 2013;p. 215–220.
  2. Farra N, Challita E, Assi RA, Hajj H. Sentence-level and document-level sentiment mining for arabic texts. IEEE international conference on data mining workshops. 2010;p. 1114–1119.
  3. Soliman AB, Eissa K, El-Beltagy SR. AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP. Procedia Computer Science. 2017;117:256–265. Available from: https://dx.doi.org/10.1016/j.procs.2017.10.117
  4. Ismail R, Omer M, Tabir M, Mahadi N, Amin I. Sentiment Analysis for Arabic Dialect Using Supervised Learning. International Conference on Computer, Control, Electrical, and Electronics Engineering. 2018;p. 1–6.
  5. Kwaik K, Saad MK, Chatzikyriakidis S, Dobnik S. Shami: A Corpus of Levantine Arabic Dialects. 2018.
  6. Vanaja S, Belwal M. Aspect-Level Sentiment Analysis on E-Commerce Data. International Conference on Inventive Research in Computing Applications (ICIRCA). 2018;p. 1275–1279.
  7. Schouten K, Frasincar F. Survey on Aspect-Level Sentiment Analysis. IEEE Transactions on Knowledge and Data Engineering. 2016;28(3):813–830. Available from: https://dx.doi.org/10.1109/tkde.2015.2485209
  8. Ikoro V, Sharmina M, Malik K, Batista-Navarro R. Analyzing sentiments expressed on twitter by uk energy company consumers. Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS). 2018;p. 95–98.
  9. Biltawi M, Al-Naymat G, Tedmori S. Arabic sentiment classification: A hybrid approach. International Conference on New Trends in Computing Sciences (ICTCS. 2017;p. 104–108.
  10. Itani M, Roast C, Al-Khayatt S. Corpora for sentiment analysis of Arabic text in social media. 8th international conference on information and communication systems. 2017;p. 64–69.
  11. Ghallab A, Mohsen A, Ali Y. Arabic Sentiment Analysis: A Systematic Literature Review. Applied Computational Intelligence and Soft Computing. 2020;2020:1–21. Available from: https://dx.doi.org/10.1155/2020/7403128
  12. Addawood A, Alshamrani A, Alqahtani A, Diesner J, Broniatowski D. Women’s Driving in Saudi Arabia-Analyzing the Discussion of a Controversial Topic on Twitter. 2018.
  13. Salas-Zárate MdP, Medina-Moreira J, Lagos-Ortiz K, Luna-Aveiga H, Rodríguez-García MÁ, Valencia-García R. Sentiment Analysis on Tweets about Diabetes: An Aspect-Level Approach. Computational and Mathematical Methods in Medicine. 2017;2017:1–9. Available from: https://dx.doi.org/10.1155/2017/5140631
  14. Abdulla NA, Ahmed NA, Shehab MA, Al-Ayyoub M. Arabic sentiment analysis: Lexicon-based and corpus-based. IEEE Jordan conference on applied electrical engineering and computing technologies. 2013;p. 1–6.
  15. Skuza M, Romanowski A. Sentiment analysis of Twitter data within big data distributed environment for stock prediction. Federated Conference on Computer Science and Information Systems (FedCSIS). 2015;p. 1349–1354.


© 2020 Alshammari. This is an open access article distributed under the terms of the  Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee).


Subscribe now for latest articles and news.