Building Arabic Corpus Applied to Part-of-Speech Tagging

Rabab Ali Abumalloh; Hassan Maudi Al Sarhan  and Waheeb Abu Ulbeh

doi:10.17485/ijst/2016/v9i46/107110

Article

Building Arabic Corpus Applied to Part-of-Speech Tagging

VIEWS 1039
PDF 829

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2016/v9i46/107110

Year: 2016, Volume: 9, Issue: 46, Pages: 1-9

Original Article

Building Arabic Corpus Applied to Part-of-Speech Tagging

Rabab Ali Abumalloh^{1 *}, Hassan Maudi Al-Sarhan² and Waheeb Abu-Ulbeh³

¹University of Dammam, Department of Computer Science, Dammam − 31451, Saudi Arabia; [email protected] ²Ajloun National University, Information Technology, Ajloun − 26810, Jordan. ³Universiti Teknologi Malaysia, UTM Johor Bahru, 81310 Johor, Malaysia.

*Author for correspondence
Rabab Ali Abumalloh University of Dammam, Department of Computer Science, Dammam − 31451, Saudi Arabia; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objective: This paper aimed to review corpus linguistics sources related to part-of-speech tagging and to build up a sufficient annotated corpus for the Arabic language that contains Arabic words and their grammatical tags. Methods/ Statistical Analysis: An in-depth survey conducted by the author’s showed that there is a need for free tagged Arabic corpus that can be used in natural language processing researches. A corpus of 25,000 words collected manually from different web sources which ware written in Modern Standard Arabic. The collected words were tagged using Arabic language grammar books. Findings: The developed corpus can help the researchers in natural language processing applications. Applications/Improvements: This corpus needed to be expanded to include more words and their grammatical tags.

Keywords: Arabic Language, Corpus, Linguistics, Part of Speech Tagging