Indian Journal of Science and Technology
Year: 2023, Volume: 16, Issue: 13, Pages: 1021-1029
Tawseef Ahmad Mir1, Aadil Ahmad Lawaye2*, Parveen Rana2, Ghayas Ahmed1
1Research Scholar, Department of Computer Sciences, Baba Ghulam Shah Badshah University, Rajouri, India
2Assistant Professor, Department of Computer Sciences, Baba Ghulam Shah Badshah University, Rajouri, India
Email: [email protected]
Received Date:15 December 2022, Accepted Date:04 March 2023, Published Date:06 April 2023
Objectives: In this research work maiden attempt is made towards developing a sense annotated corpus for Kashmiri Lexical Sample Word Sense Disambiguation (WSD). Sense annotated dataset is required to use Supervised WSD techniques which are the most effective techniques to carry out WSD. As developing a sense-tagged dataset is an arduous task such datasets are not available for all natural languages. Kashmiri being computationally a lowresource language does not have a sense-tagged corpus available for research purposes. Methods: To develop the sense annotated dataset we selected 60 commonly used ambiguous Kashmiri words and annotated the dataset using the manual annotation method. The usefulness of the dataset is also examined by implementing machine learning algorithms (k-NN, Decision Tree (DT) and Support Vector Machine (SVM)) on it. Part of Speech (PoS) and Bag of Words (BoW) features are used to train the classifiers. Findings: The performance of the machine learning algorithms for Kashmiri WSD is evaluated using accuracy metric. Out of the different classifiers used SVM showed the best performance with an average accuracy of 75.74%. Novelty: This research is the first attempt to develop a sense-tagged dataset for Kashmiri language. The developed dataset would be of great importance to the research community and can be used in various Natural Language Processing tasks like WSD, part-of-speech tagging.
Keywords: Sense Annotation; Machine Learning; Word Sense Disambiguation; WordNet; Part-of-Speech Tagging
© 2023 Mir et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)
Subscribe now for latest articles and news.