Parts-of-Speech Tagging for Unknown Words in Assamese using Viterbi Algorithm

Rituraj Phukan; Nomi Baruah; Shikhar Kr Sarma; Darpanjit Konwar

doi:10.17485/IJST/v16iSP2.8203

Article

Parts-of-Speech Tagging for Unknown Words in Assamese using Viterbi Algorithm

VIEWS 457
PDF 105

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v16iSP2.8203

Year: 2023, Volume: 16, Issue: Special Issue 2, Pages: 53-59

Original Article

Parts-of-Speech Tagging for Unknown Words in Assamese using Viterbi Algorithm

Rituraj Phukan^1*, Nomi Baruah¹, Shikhar Kr Sarma², Darpanjit Konwar¹

¹Dibrugarh University, Dibrugarh, Assam, India
²Gauhati University, Guwahati, Assam, India

*Corresponding Author
Email: [email protected]

Received Date:23 March 2023, Accepted Date:26 June 2023, Published Date:02 November 2023

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: This study aims to explore the application of the Viterbi algorithm for Part-of-Speech (POS) tagging in the Assamese language, focusing on tagging out-of-vocabulary words. The objective of this paper is to assess the algorithm's performance using various training and testing data ratios of 50:50, 70:30, and 90:10. Methods: The study utilizes the dynamic programming capabilities of the Viterbi algorithm to determine the most likely sequence of hidden states based on observable events. A corpus comprising approximately 50,000 words is employed to train the algorithm, with different ratios of this data utilized for training and testing purposes. Findings: The Viterbi algorithm achieves an accuracy of 86.34%, surpassing the state-of-the-art POS taggers for the Assamese language. The experimental evaluation demonstrates that the proposed approach outperforms previously existing research work by achieving 6.14% higher accuracy in tagging out-of-vocabulary words, highlighting its effectiveness in addressing the challenges associated with less-resourced languages like Assamese. Novelty: The results of this study contribute to the understanding and development of POS tagging techniques in less-resourced languages like Assamese. The proposed approach not only achieves superior performance in terms of accuracy but also showcases its potential for improving POS tagging in similar linguistic contexts, surpassing the achievements of previous research efforts.

Keywords: Assamese, NLP, Outofvocabulary, POS Tagging, Viterbi Algorithm

References

Chiche A, Yitagesu B. Part of speech tagging: a systematic review of deep learning and machine learning approaches. Journal of Big Data. 2022;9(10):1–25. Available from: https://doi.org/10.1186/s40537-022-00561-y
Tehseen A, Ehsan T, Liaqat HB, Ali A, Al-Fuqaha A. Neural POS tagging of shahmukhi by using contextualized word representations. Journal of King Saud University - Computer and Information Sciences. 2023;35(1):335–356. Available from: https://doi.org/10.1016/j.jksuci.2022.12.004
Gore T, Khatavkar V. Development of Part-of-Speech tagger for a low-resource endangered language. In: 2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N). Greater Noida, India, 16-17 December 2022. IEEE. p. 1531–1535.
Das R, Singh TD. Image–Text Multimodal Sentiment Analysis Framework of Assamese News Articles Using Late Fusion. ACM Transactions on Asian and Low-Resource Language Information Processing. 2023;22(6):1–30. Available from: https://doi.org/10.1145/3584861
Pathak D, Nandi S, Sarmah P. AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach. In: 2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA). Abu Dhabi, United Arab Emirates, 05-08 December 2022. IEEE. p. 1–8.
Dutta D, Halder S, Gayen T. Intelligent Part of Speech tagger for Hindi. Procedia Computer Science. 2023;218:604–611. Available from: https://doi.org/10.1016/j.procs.2023.01.042
Jamwal SS. Development of POS tag set for the Dogri language using SMT. International Journal of Electronics Engineering. 2021;13(1):12–15. Available from: http://www.csjournals.com/IJEE/PDF13-1/3.%20Shub.pdf
Siram J, Sambyo K, Sarkar A. Parts of Speech Tagging of the Nyishi Language Using Hmm. Advanced Engineering Science. 2022;54(02):3873–3880. Available from: https://advancedengineeringscience.com/article/pdf/3873.pdf
Ali MNY, Rahman ML, Chaki J, Dey N, Santosh KC. Machine translation using deep learning for universal networking language based on their structure. International Journal of Machine Learning and Cybernetics. 2021;12:2365–2376. Available from: https://doi.org/10.1007/s13042-021-01317-5
Nugraha DW, Richasdy D, Ihsan AF. Tagging Efficiency Analysis of Part of Speech Taggers on Indonesian News. Jurnal Media Informatika Budidarma. 2023;7(1):214–222. Available from: http://dx.doi.org/10.30865/mib.v7i1.5384
Bharti SK, Gupta RK, Patel S, Shah M. Context-Based Bigram Model for POS Tagging in Hindi: A Heuristic Approach. Annals of Data Science. 2022;p. 1–32. Available from: https://doi.org/10.1007/s40745-022-00434-4
Baishya D, Baruah R. Highly Efficient Parts of Speech Tagging in Low Resource Languages with Improved Hidden Markov Model and Deep Learning. International Journal of Advanced Computer Science and Applications. 2021;12(10):82–94. Available from: https://doi.org/10.14569/IJACSA.2021.0121011

Copyright

© 2023 Phukan et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)