• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: 13, Pages: 1021-1029

Original Article

Building Kashmiri Sense Annotated Corpus and its Usage in Supervised Word Sense Disambiguation

Received Date:15 December 2022, Accepted Date:04 March 2023, Published Date:06 April 2023


Objectives: In this research work maiden attempt is made towards developing a sense annotated corpus for Kashmiri Lexical Sample Word Sense Disambiguation (WSD). Sense annotated dataset is required to use Supervised WSD techniques which are the most effective techniques to carry out WSD. As developing a sense-tagged dataset is an arduous task such datasets are not available for all natural languages. Kashmiri being computationally a lowresource language does not have a sense-tagged corpus available for research purposes. Methods: To develop the sense annotated dataset we selected 60 commonly used ambiguous Kashmiri words and annotated the dataset using the manual annotation method. The usefulness of the dataset is also examined by implementing machine learning algorithms (k-NN, Decision Tree (DT) and Support Vector Machine (SVM)) on it. Part of Speech (PoS) and Bag of Words (BoW) features are used to train the classifiers. Findings: The performance of the machine learning algorithms for Kashmiri WSD is evaluated using accuracy metric. Out of the different classifiers used SVM showed the best performance with an average accuracy of 75.74%. Novelty: This research is the first attempt to develop a sense-tagged dataset for Kashmiri language. The developed dataset would be of great importance to the research community and can be used in various Natural Language Processing tasks like WSD, part-of-speech tagging.

Keywords: Sense Annotation; Machine Learning; Word Sense Disambiguation; WordNet; Part-of-Speech Tagging


© 2023 Mir et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee


