Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: 46, Pages: 1-9
Rabab Ali Abumalloh1 *, Hassan Maudi Al-Sarhan2 and Waheeb Abu-Ulbeh3
1University of Dammam, Department of Computer Science, Dammam − 31451, Saudi Arabia; [email protected] 2Ajloun National University, Information Technology, Ajloun − 26810, Jordan. 3Universiti Teknologi Malaysia, UTM Johor Bahru, 81310 Johor, Malaysia.
*Author for correspondence
Rabab Ali Abumalloh University of Dammam, Department of Computer Science, Dammam − 31451, Saudi Arabia; [email protected]
Objective: This paper aimed to review corpus linguistics sources related to part-of-speech tagging and to build up a sufficient annotated corpus for the Arabic language that contains Arabic words and their grammatical tags. Methods/ Statistical Analysis: An in-depth survey conducted by the author’s showed that there is a need for free tagged Arabic corpus that can be used in natural language processing researches. A corpus of 25,000 words collected manually from different web sources which ware written in Modern Standard Arabic. The collected words were tagged using Arabic language grammar books. Findings: The developed corpus can help the researchers in natural language processing applications. Applications/Improvements: This corpus needed to be expanded to include more words and their grammatical tags.
Keywords: Arabic Language, Corpus, Linguistics, Part of Speech Tagging
Subscribe now for latest articles and news.