HMM Chunker for Punjabi

Chandan Mittal; Vishal Goyal and Umrinderpal Singh

doi:10.17485/ijst/2015/v8i35/85367

Article

VIEWS 1230
PDF 267

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2015/v8i35/85367

Year: 2015, Volume: 8, Issue: 35, Pages: 1-5

Original Article

HMM Chunker for Punjabi

Chandan Mittal^*, Vishal Goyal and Umrinderpal Singh

Department of Computer Science, Punjab University, Patiala – 147002, India;
[email protected], [email protected], [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

This paper presents a Hidden Markov Model (HMM) based Chunker for Punjabi. Chunking is the process of segmenting the text into syntactically correlated word groups known as chunks and then identifying the labels of the defined chunks. A robust Chunker is an important component for various applications requiring Natural Language Processing (NLP). In this research work, my goal is to develop an HMM based Chunker for Punjabi language. HMM Chunker is based on statistical probabilities. I have followed Hidden Markov Model in achieving my goal in which Viterbi Algorithm is used for calculating the highest probability of chunks and to train the system, Baum-Welch algorithm is followed and 25,000 lines of chunked Punjabi text are used. An annotated text file having 1,000 lines is used for testing the system. The accuracy of the system to find the chunk boundaries of the system is about 80% approx and the labelling is applied with an accuracy of about 98% and the labelling is applied with an accuracy of about 82%.
Keywords: Baum-Welch Algorithm, Chunking, Hidden Markov Model, Viterbi Algorithm