Recently Sentiment Analysis has played a major role in achieving different goals in many organizations. For example, using customer reviews to develop or improve specific services or goods. There are different critical points that must be considered to complete that achievement, such as the source of the collected data, the volume and comprehensive the data, and the ability of processing the data to get some results from it. In
Based on various factors and issues, such as the form of the data, the language of text data, etc., sentiment analysis process may obtain good results or may not. Therefore, the processing of Arabic text data might need more efforts than processing English text data because of the language rules, the data availability in the Arabic language and the paucity of studies conducted and applied to the Arabic language sentiment analysis
The complexity of working with the Arabic language arises because of the number of dialects
Text Sentiment Analysis focuses on exploring the meaning behind the text. Therefore, the accuracy of the results is related to the language and its meanings. In
In
The manual annotation process plays a major role in building a strong base for sentiment analysis. In
We explored a very helpful review paper about Arabic Sentiment Analysis ASA
We targeted Twitter text data as an open public environment that would include most peoples’ opinions about various issues these days. Therefore, the complexity of analyzing tweets arises from the dialect differences and use of the new phrases among the young people. Also, since the Arabic language is spoken by different people from different countries and cultures
We accessed Twitter API and downloaded more than 20,000 tweets that were collected by searching for seven common issues at the time. Those issues are as follows: two of them about two controversial people (أحمد العيسى and تركي ال الشيخ ), and two different sport clubs (الهلال and النصر ), and two different ecommerce websites (جولي شيك and نون ), and one common employment exam (كفايات ).
Whereas these keywords were the most active keywords at that time, it should be noted that the collected data were based on the area of Saudi Arabia and were related to these specific topics. Therefore, we have neither studied considering the differentiation in dialects nor studied other issues or topics in the other Arab countries.
After removing the retweeted tweets, we ended-up with 8,247 valid tweets to be analyzed. The process of cleaning the data or preparing the data to be ready for analysis is the important process in the study. Therefore, we applied a couple of formatting rules to prepare the data, such as removing Non-Arabic letters, numbers, control characters and graphics. However, we kept the punctuation marks such as question and exclamation marks, because they might reflect sentiments.
Some of the preparing rules could be applied too, such as the normalization, which is unifying the letters that might come in different formats. For example, Aleph letter with Hamzah ( أ ) or without ( ا ). Also, the letter “Ya” with two dots (ي ) or without (ى ) and so on. However, some of these changes could affect the real meaning, and it is one of the complexities of working with Arabic text. We completed the normalization during the manual annotation while we annotated the tweets.
The manual annotation approach mainly relies on understanding the sentences and meaning behind them. In sentiment analysis we call it Aspect level sentiment analysis
After getting cleaned tweets, we started annotation of the tweets based on five different categories as follows:
During the annotation process, we came across some tweets that were not clear. In this case we tried to include them after having them re-read more than one time by different annotators. Also, some tweets were not related to the topic, such as ads and personal/general tweets. Nevertheless, some of them were annotated because they still carry sentiments.
There are many lexicons that have been built to provide a high level environment to process Arabic sentiments
For clear positive and clear negative categories, the annotators write the sentiment word/words in column/columns that were created to store positive and negative words. Thus, a list of positive words and negative words was compiled.
We built a positive bag-of-words and negative bag-of-words (BOW) from all clear positive and clear negative annotated tweets.
First, at least one word not exceeding three was extracted from each tweet.
Then, all the words from the seven files were compiled in one list.
The duplicated words were removed from the final list.
It should be noted that, the annotators did the normalization manually while doing the annotation process to ensure that the annotated words would have no problems caused by the errors that may occur because of the Hamza (ء ), Dots (.), Madod (~), or even duplication of the letters. For example, in the word (يجنن ) which means (very nice), some people may duplicate the second last letter several times, in the following way: (يجننننن ) to show emphasis or more enthusiasm. This kind of normalization might affect the proficiency of the results, because of people writing the sentiment words with these errors or repeating some letters in a word to emphasize the meaning.
Also, there are some words that may have different meaning based on their usage or context. It is common among certain Arab people to write a word which means the opposite. This can certainly affect the sentiment of the text.
In the project, for only positive and negative categories, the annotators wrote the phrases that helped to categorize the tweet to be either positive or negative. The phrases could be one or more that has no sentiments alone but when come together in a sentence they could express a sentiment. Also, they could contain common names or adjectives that have either positive or negative sentiment in the culture, such as (حسبي الله ) together with (كفايات ), so the tweet that contains these two phrases most likely has negative opinion.
For neutral category, there were no sentiment words and no phrases. However, in some cases the annotators added some notes that indicated why they had annotated the tweets as neutral. For example, tweets containing a question mark. After having the manual annotation process completed, we came up with a general idea about the most popular positive words and negative words that were used in the people’s tweets to express their opinions about the tested issues.
Some of the results were shown in
In
We analyzed the first results indicating the statistics that we obtained using manual annotation to compare them with any results that we could obtain from applying any machine learning algorithms such as SVM, Naive Bayes or any other classification algorithm.
From the results, it is noted that most of the topics share one or more common words which are either positive or negative. For example, the word “ظلم ” which means “Injustice”, appears in most topics as the common negative words in the lists. However, to determine the shared common words between different topics is not feasible, because the common words are dependent on the topics, such as sports, politics, education, markets and so on.
The following charts show general statistics of the first results. In
The following chart
From the above charts, we can notice that the word “أفضل ” which means “better” is a shared common word among the two topics. However, we can find this word in each positive list of most topics because it is one of the top positive words in most comparable topics.
Dealing with phrases to analyze sentiments in Arabic text is another goal of the project. So, we manually categorized some phrases that might help to analyze the sentiments. The following chart gives an example of it.
For the purpose of our research, we tried to create a bag of phrases that cloud help in some areas to analyze the sentiments. So, two columns additional were created in our work to add phrases in them from the tweets. These phrases could be used secondarily in future to analyze the sentiments in Arabic texts. The expectations from this step are really not very high but it might help in certain ways, given the fact that opinion mining is not easy in the aspect level.
In addition, the phrases that we have might not include any sentiment word. Thus, our approach of considering the phrases is beneficial. There would be a difference between the traditional machine learning approaches such as Bag-of-Words BOW approach and ours, which is Bag-of-Phrases BoPh approach. For example, in some topics, we found phrases that have negative meaning such as “أغسل يدك ” which is commonly used in Saudi culture for negative expression, however, it doesn’t have any sentiment word in it, and it might be understood as a positive meaning for those people who do not know the local phrases in Arabic.
Also, in some cases we need to analyze the sentiments for a specific topic or issue in different times, such as the beginning of the year, middle of the year, and end of the year. In this case, the proposed approach (BoPh) would help and improve the performance as we have noticed in our study.
As we can notice from the above tables there is some duplication in the phrases, and that is because of one of the following reasons:
In the same list (either positive or negative list), that is because of the difference in at least one letter such as “حسبي ” as singular and “حسبنا ” plural.
In different lists, that is because of the universality of the words in the phrase such as “الحمدلله “ it comes in both the lists and its meaning is related to the other words. For example, “الحمدلله على كل حال ”, which commonly has negative connotation (showed 29 times in the negative list) but sometimes has positive connotations. (showed 4 times in the positive list).
Some short phrases might not have any meaning, and they fall in both lists with other words to form a sentiment phrase such as “الحمدلله ” to be “الحمدلله نجحت ” and “الحمدلله عديت ” which becomes positive.
The following figures
In sentiment analysis there are different ways to discover a good solution for a specific issue. However, the nature of the issue or the situation plays a major role in user sentiments. Culture, language, or the domain of the issue are examples of the influencing factors
After building the corpus, machine learning algorithm was applied to test our proposed approach, i.e., bag-of phrases (BoPh). The total number of tweets after eliminating the “Natural” sentiments from the list is 1565 tweets that are either “Positive” or “Negative”. We applied the concept of Document-Term Matrix DTM
The next tables:
We created a corpus from the tweets and then we applied a Naive Bayes algorithm to examine our test data. The size of data was 1566 documents (tweets) and 1766 terms. The training data is (1000 X 1766) and testing data is (566 X 1766). The classifier is created by building tweets and DTM table.
Although the first part of the research which is about the manual annotation was completed by Arabic native speakers, there were some uncertainty in some results. For example, there were few uncertainties in part of results which were based on human interpretation of topics. So, there were some annotators had annotated the tweets based on their understood and their emotions about the topic itself. In addition, the annotators sometimes read the tweet from different prospective. For example, in “أحمد العيسى ” topic, which is about one of the controversial people, who was the minister of the education at that time, one annotator might annotate some tweets as positive while the other annotator might annotate the same tweets as negative, and that based on their feelings about him. By knowing the difference between people opinion in the manual annotation process, we can understand the nature of the results that we will discuss in this section.
The research mainly focuses on building a new Bag-of-Word (BoW) and Bag-of-Phrases (BoPh) for Arabic text by collecting the sentiment words and phrases from tweets. Whereas there are couple of Bag-of-Word lists provided by researchers in the field, most of them either based on the formal Arabic language, which is not commonly used in social media nowadays, or there is shortage in most of the lists.
The result shows that the BoPh model helps in classifying the tweets based on different criteria such as the field of the topic, geographical area etc. However, some phrases could be used either in positive or negative sentiments at the same time such as “الحمدلله ”, which means “we accept that anyway”. Nevertheless, we can determine the sentiment if we use a slide-window technique in a limited amount of words in some cases, e.g. “الحمدلله نجحت ” for a positive sentiment or “الحمدلله على كل حال ” for a negative sentiment. Some phrases present the sentiment directly based on the topic itself just like a commonly used bad phrase or a good phrase, such as “حسبي الله ونعم الوكيل ”.
Also, the result shows that by applying a Naive Bayes algorithm to determine the accuracy of our model, the accuracy of testing data is 84%, whereas 16% were either false positive or false negative based on the training data set. However, since the study may have some uncertainty or margin of error in the input data because of manual annotation, the accuracy value can be considered as acceptable.
In this study, manual annotation was applied to Arabic text data “tweets” for Saudi dialect, and positive or negative sentiments were determined. We collected at most three positive or negative words from each tweet to create an Arabic Bag of Words. At the same time, not more than two phrases were collected from many tweets to create an Arabic Bag of Phrases (BoPh), which would help in some common uses in expression.
Then, Naïve Bayes algorithm was applied to test our model. The results are acceptable and indicate 84% accuracy after taking into consideration the margin of error due to the manual annotation step and subjective interpretation of the texts by the annotators. Nevertheless, the annotators took utmost care to be as objective as possible in the interpretation of the text. Finally, this study has a very high accuracy in training data set in Saudi dialect as an