Indian Journal of Science and Technology
Year: 2023, Volume: 16, Issue: Special Issue 2, Pages: 53-59
Rituraj Phukan1*, Nomi Baruah1, Shikhar Kr Sarma2, Darpanjit Konwar1
1Dibrugarh University, Dibrugarh, Assam, India
2Gauhati University, Guwahati, Assam, India
Email: [email protected]
Received Date:23 March 2023, Accepted Date:26 June 2023, Published Date:02 November 2023
Objectives: This study aims to explore the application of the Viterbi algorithm for Part-of-Speech (POS) tagging in the Assamese language, focusing on tagging out-of-vocabulary words. The objective of this paper is to assess the algorithm's performance using various training and testing data ratios of 50:50, 70:30, and 90:10. Methods: The study utilizes the dynamic programming capabilities of the Viterbi algorithm to determine the most likely sequence of hidden states based on observable events. A corpus comprising approximately 50,000 words is employed to train the algorithm, with different ratios of this data utilized for training and testing purposes. Findings: The Viterbi algorithm achieves an accuracy of 86.34%, surpassing the state-of-the-art POS taggers for the Assamese language. The experimental evaluation demonstrates that the proposed approach outperforms previously existing research work by achieving 6.14% higher accuracy in tagging out-of-vocabulary words, highlighting its effectiveness in addressing the challenges associated with less-resourced languages like Assamese. Novelty: The results of this study contribute to the understanding and development of POS tagging techniques in less-resourced languages like Assamese. The proposed approach not only achieves superior performance in terms of accuracy but also showcases its potential for improving POS tagging in similar linguistic contexts, surpassing the achievements of previous research efforts.
Keywords: Assamese, NLP, Outofvocabulary, POS Tagging, Viterbi Algorithm
© 2023 Phukan et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)
Subscribe now for latest articles and news.