Indian Journal of Science and Technology
Year: 2015, Volume: 8, Issue: 35, Pages: 1-5
Chandan Mittal*, Vishal Goyal and Umrinderpal Singh
This paper presents a Hidden Markov Model (HMM) based Chunker for Punjabi. Chunking is the process of segmenting the text into syntactically correlated word groups known as chunks and then identifying the labels of the defined chunks. A robust Chunker is an important component for various applications requiring Natural Language Processing (NLP). In this research work, my goal is to develop an HMM based Chunker for Punjabi language. HMM Chunker is based on statistical probabilities. I have followed Hidden Markov Model in achieving my goal in which Viterbi Algorithm is used for calculating the highest probability of chunks and to train the system, Baum-Welch algorithm is followed and 25,000 lines of chunked Punjabi text are used. An annotated text file having 1,000 lines is used for testing the system. The accuracy of the system to find the chunk boundaries of the system is about 80% approx and the labelling is applied with an accuracy of about 98% and the labelling is applied with an accuracy of about 82%.
Keywords: Baum-Welch Algorithm, Chunking, Hidden Markov Model, Viterbi Algorithm
Subscribe now for latest articles and news.