• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: 37, Pages: 3050-3063

Original Article

Enhancement in Stemmer Design: Natural Language Semantics Perspective

Received Date:27 March 2023, Accepted Date:17 August 2023, Published Date:30 September 2023

Abstract

Objective: To enhance the performance and accuracy of the stemming process. Method: The Porters stemmer is used conventionally for removing common morphological and inflectional endings (suffixes) from the words in the English language. It uses a set of pre-defined rules that are less complex when compared to other existing stemmers. We have identified several imprecisions encountered during the stemming process and proposed solutions to remove and invalidate the same. Findings: The experiment was performed on a set of 762 words starting with characters “a”, “b”, and “c”. It was found that out of 762 words used for system validation and testing, the results of 355 words were different when stemmed with MPS [Modified Porter Stemmer], and the remaining 407 words resulted in the same stemmed word after using both stemmers. The Modified Porter Stemmer presented in the current paper with Python implementation has given better results for 46% of words. Novelty: This paper highlights the encountered errors while using the algorithm and provides solutions to enhance the performance and accuracy of the stemming process. The designed stemmer is named “Modified Porter Stemmer” [MPS].

Keywords: Natural Language Processing; Stemmer; Porter’s Stemmer; Enhancement; Stemming Process

References

  1. Polus ME, Abbas T. Development for performance of Porter stemmer algorithm. Eastern-European Journal of Enterprise Technologies. 2021;1(2 (109)):6–13. Available from: https://doi.org/10.15587/1729-4061.2021.225362
  2. Asiri Y, Halawani HT, Alghamdi HM, Hamza SHA, Abdel-Khalek S, Mansour RF. Enhanced Seagull Optimization with Natural Language Processing Based Hate Speech Detection and Classification. Applied Sciences. 2022;12(16):8000. Available from: https://doi.org/10.3390/app12168000
  3. Ramadhan A, Abdurachman E, Trisetyarso A, Zarlis M. Stemming Algorithm for Indonesian Language: A Scientometric View. IEEE Creative Communication and Innovative Technology. 2022. Available from: https://doi.org/10.1109/ICCIT55355.2022.10119050
  4. Şentürk F, Gunduz G. A framework for investigating search engines' stemming mechanisms: A case study on Bing. Concurrency and Computation: Practice and Experience. 2022;34(9). Available from: https://doi.org/10.1002/cpe.6562
  5. Imin G, Ablimit M, Yilahun H, Hamdulla A. A Character String-Based Stemming for Morphologically Derivative Languages. Information. 2022;13(4):170. Available from: http://dx.doi.org/10.3390/info13040170
  6. Khurana D, Koli A, Khatter K, Singh S. Natural language processing: state of the art, current trends and challenges. Multimedia Tools and Applications. 2023;82(3):3713–3744. Available from: https://doi.org/10.1007/s11042-022-13428-4
  7. Tan KL, Lee CP, Lim KM, Anbananthen KSM. Sentiment Analysis With Ensemble Hybrid Deep Learning Model. IEEE Access. 2022;10:103694–103704. Available from: https://doi.org/10.1109/ACCESS.2022.3210182
  8. Sworna ZT, Mousavi Z, Babar MA. NLP Methods in Host-based Intrusion Detection Systems: A Systematic Review and Future Directions”. Journal of Network and Computer Applications. 2022. Available from: https://doi.org/10.48550/arXiv.2201.08066

Copyright

© 2023 Rihan & Astikar. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.