• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: Special Issue 2, Pages: 38-43

Original Article

Assamese Inflectional Rule-Based Stemmer

Received Date:23 March 2023, Accepted Date:26 June 2023, Published Date:02 November 2023

Abstract

The research paper presents an Assamese Inflectional Rule-Based Stemmer with supervised rules that tackles the issues brought on by the language’s complex morphology. Objectives: The goal is to create a stemmer that accurately captures a word’s context while taking its inflectional forms into account. Methods: This was accomplished using a rule-based approach that had been improved by the incorporation of Parts-Of-Speech tagged (POS-tagged) data with Assamese WordNet. A corpus of 50,000 words from diverse sources was used, together with POS-tagged data. Assamese WordNet was also added to improve the stemmer’s performance. Findings: An accuracy of 91% achieved utilising the tagged data, surpassing the 86% accuracy attained without the use of tags, the evaluation findings showed a substantial improvement. The suggested stemmer successfully captures word meanings and their inflectional variations by combining POS tagging and utilizing Assamese WordNet. Novelty: Applications like sentiment analysis and information retrieval systems benefit from this research’s advancement of Assamese language processing. The development of accurate stemming methods successfully closes the prevailing gap in elemental processing. This development enables information retrieval systems to operate more quickly and accurately.

Keywords:  Stemming, POS­tag, NLP, Suffix, Prefix

References

  1. Jabbar A, Iqbal S, Tamimy MI, Hussain S, AA. Empirical evaluation and study of text stemming algorithms. Artificial Intelligence Review volume. 2020;53:5559–5588. Available from: https://doi.org/10.1007/s10462-020-09828-3
  2. Nathani B, Joshi N, Purohit GN. Design and Development of Unsupervised Stemmer for Sindhi Language. Procedia Computer Science. 2020;167:1920–1927. Available from: https://doi.org/10.1016/j.procs.2020.03.212
  3. Alobed M, Altrad AMM, Bakar ZBA, Zamin N. Automated Arabic Essay Scoring Based on Hybrid Stemming with Wordnet. Malaysian Journal of Computer Science. 2021;(Special Issue 2) 55–67. Available from: https://doi.org/10.22452/mjcs.sp2021no2.4
  4. Gogoi A, Baruah N, Sarma SK, Phukan RD. Improving stemming for Assamese information retrieval. International Journal of Information Technology. 2021;13:1763–1768. Available from: https://doi.org/10.1007/s41870-021-00718-7
  5. Malik MH, Ghous H, Ahsan I, Ismail M. Saraiki Language Hybrid Stemmer Using Rule-Based and LSTM-Based Sequence-To-Sequence Model Approach. Innovative Computing Review. 2022;2(2):18–40. Available from: https://doi.org/10.32350/icr.0202.02
  6. Kariyawasam KTPM, Senanayake SY, Haddela PS. A Rule Based Stemmer for Sinhala Language. In: 2019 14th Conference on Industrial and Information Systems (ICIIS). Kandy, Sri Lanka, 18-20 December 2019. IEEE. .
  7. Shakib MSS, Ahmed T, Hasan KMA. Designing a Bangla Stemmer using rule based approach. In: 2019 International Conference on Bangla Speech and Language Processing (ICBSLP). Sylhet, Bangladesh, 27-28 September 2019. IEEE. .
  8. Sarmah J, Sarma SK, Barman AK. Development of Assamese Rule based Stemmer Using WordNet. In: Proceedings of the 10th Global Wordnet Conference. (pp. 135-139) Global Wordnet Association. 2019.

Copyright

© 2023 Neog & Baruah. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.