• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: Special Issue 2, Pages: 53-59

Original Article

Parts-of-Speech Tagging for Unknown Words in Assamese using Viterbi Algorithm

Received Date:23 March 2023, Accepted Date:26 June 2023, Published Date:02 November 2023

Abstract

Objectives: This study aims to explore the application of the Viterbi algorithm for Part-of-Speech (POS) tagging in the Assamese language, focusing on tagging out-of-vocabulary words. The objective of this paper is to assess the algorithm's performance using various training and testing data ratios of 50:50, 70:30, and 90:10. Methods: The study utilizes the dynamic programming capabilities of the Viterbi algorithm to determine the most likely sequence of hidden states based on observable events. A corpus comprising approximately 50,000 words is employed to train the algorithm, with different ratios of this data utilized for training and testing purposes. Findings: The Viterbi algorithm achieves an accuracy of 86.34%, surpassing the state-of-the-art POS taggers for the Assamese language. The experimental evaluation demonstrates that the proposed approach outperforms previously existing research work by achieving 6.14% higher accuracy in tagging out-of-vocabulary words, highlighting its effectiveness in addressing the challenges associated with less-resourced languages like Assamese. Novelty: The results of this study contribute to the understanding and development of POS tagging techniques in less-resourced languages like Assamese. The proposed approach not only achieves superior performance in terms of accuracy but also showcases its potential for improving POS tagging in similar linguistic contexts, surpassing the achievements of previous research efforts.

Keywords: Assamese, NLP, Out­of­vocabulary, POS Tagging, Viterbi Algorithm

References

  1. Tehseen A, Ehsan T, Liaqat HB, Ali A, Al-Fuqaha A. Neural POS tagging of shahmukhi by using contextualized word representations. Journal of King Saud University - Computer and Information Sciences. 2023;35(1):335–356. Available from: https://doi.org/10.1016/j.jksuci.2022.12.004
  2. Das R, Singh TD. Image–Text Multimodal Sentiment Analysis Framework of Assamese News Articles Using Late Fusion. ACM Transactions on Asian and Low-Resource Language Information Processing. 2023;22(6):1–30. Available from: https://doi.org/10.1145/3584861
  3. Pathak D, Nandi S, Sarmah P. AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach. In: 2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA). Abu Dhabi, United Arab Emirates, 05-08 December 2022. IEEE. p. 1–8.
  4. Dutta D, Halder S, Gayen T. Intelligent Part of Speech tagger for Hindi. Procedia Computer Science. 2023;218:604–611. Available from: https://doi.org/10.1016/j.procs.2023.01.042
  5. Jamwal SS. Development of POS tag set for the Dogri language using SMT. International Journal of Electronics Engineering. 2021;13(1):12–15. Available from: http://www.csjournals.com/IJEE/PDF13-1/3.%20Shub.pdf
  6. Siram J, Sambyo K, Sarkar A. Parts of Speech Tagging of the Nyishi Language Using Hmm. Advanced Engineering Science. 2022;54(02):3873–3880. Available from: https://advancedengineeringscience.com/article/pdf/3873.pdf
  7. Ali MNY, Rahman ML, Chaki J, Dey N, Santosh KC. Machine translation using deep learning for universal networking language based on their structure. International Journal of Machine Learning and Cybernetics. 2021;12:2365–2376. Available from: https://doi.org/10.1007/s13042-021-01317-5
  8. Nugraha DW, Richasdy D, Ihsan AF. Tagging Efficiency Analysis of Part of Speech Taggers on Indonesian News. Jurnal Media Informatika Budidarma. 2023;7(1):214–222. Available from: http://dx.doi.org/10.30865/mib.v7i1.5384
  9. Bharti SK, Gupta RK, Patel S, Shah M. Context-Based Bigram Model for POS Tagging in Hindi: A Heuristic Approach. Annals of Data Science. 2022;p. 1–32. Available from: https://doi.org/10.1007/s40745-022-00434-4
  10. Baishya D, Baruah R. Highly Efficient Parts of Speech Tagging in Low Resource Languages with Improved Hidden Markov Model and Deep Learning. International Journal of Advanced Computer Science and Applications. 2021;12(10):82–94. Available from: https://doi.org/10.14569/IJACSA.2021.0121011

Copyright

© 2023 Phukan et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.