• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2022, Volume: 15, Issue: 43, Pages: 2275-2281

Original Article

Natural Language Processing Resources for the Kashmiri Language

Received Date:30 September 2022, Accepted Date:11 October 2022, Published Date:16 November 2022


Objectives: The main objective of this paper as a maiden attempt is to identify the basic resources necessary for undertaking Natural Language Processing (NLP) specific research activities pertaining to Kashmiri language. The paper also deliberates on key issues related to Natural Language Processing of Kashmiri language such as complex linguistic phenomena, the lack of standard linguistic tools, documented as well as standardized resources and the influence of some dominant languages mostly Urdu and English on Kashmiri language. Methods: As there is no substantial work reported in literature specific to NLP of Kashmiri language, a holistic research strategy was adopted to explore the possible sources as potential means for creation of basic resources to undertake the NLP research for Kashmiri language. Findings: After thorough investigation, it was observed that there has been some trivial work reported in the literature related to Machine Translation of Kashmiri language. Further there are few newspapers published in Kashmiri language which can be used as a means for creation of Kashmiri corpus. Moreover crowdsourcing could be used a potential means for development of digital linguistic resources for Kashmiri language. Novelty: The present study is a maiden attempt towards identification of NLP resources for Kashmiri language and will be of immense importance for the research community interested to work for development of Kashmiri language in digital domain.

Keywords: Natural Language Processing; Transliteration; Kashmiri Language; Scheduled Languages; crowdsource; Tag set; P-o-S Tagging


  1. Martinez REL, Sierra G. Research Trends in the International Literature on Natural Language Processing, 2000-2019 - A Bibliometric Study. Journal of Scientometric Research. 2020;9(3):310–318. Available from: https://doi.org/10.5530/jscires.9.3.38
  2. Gupta V, Joshi N, Mathur I. Advanced Machine Learning Techniques in Natural Language Processing for Indian Languages. Smart Techniques for a Smarter Planet. 2019;374:117–144. Available from: https://doi.org/10.1007/978-3-030-03131-2_7
  3. Rajan A, Salgaonkar A, Joshi R. A survey of Konkani NLP resources. Computer Science Review. 2020;38:100299. Available from: https://doi.org/10.1016/j.cosrev.2020.100299
  4. Snedden C. Understanding Kashmir and Kashmiris. Choice Reviews Online. 2016;53(08). Available from: http://dx.doi.org/10.5860/choice.195226
  5. Dash NS, Bhattacharyya P, Pawar JD. The WordNet in Indian Languages. (pp. 1-264) Springer Singapore. 2017.
  6. Baker P, Hardie A, Mcenery T, Cunningham H, Gaizauskas R. EMILLE, A 67-Million Word Corpus of Indic Languages: Data Collection, Mark-up and Harmonisation. 2002;p. 219–225.
  7. Ramamoorthy L, NC, Bhat SM. Central Institute of Indian Languages, Mysore A Gold Standard Kashmiri Raw Text Corpus. Mysore. 2019.
  8. Aadil M, Asger M. English to Kashmiri Transliteration System - A Hybrid Approach. International Journal of Computer Applications. 2017;162(12):5–8. Available from: https://doi.org/10.5120/ijca2017913418
  9. Kak AA, Mehdi N, Lawaye A. What should be and What should not be? Developing a POS tagset for Kashmiri. Interdisciplinary Journal of Linguistics (IJL). 2009;2:185–96.
  10. Lawaye AA, Purkayastha BS. Kashmir Part of Speech Tagger Using CRF. Paripex - Indian Journal Of Research. 2012;3(3):37–38. Available from: http://doi.org/10.15373/22501991/mar2014/11
  11. Lawaye AA, Purkayastha S. Towards Developing a Hierarchical Part of Speech Tagger for Kashmiri: Hybrid Approach. Proceedings of the 2nd National Conference on Advancement in the Era of Multidisciplinary Systems (AEMDS-2013). 2013;p. 187–192. Available from: https://www.researchgate.net/publication/321937581_Towards_Developing_a_Hierarchical_Part_of_Speech_Tagger_for_Kashmiri_Hybrid_Approach
  12. Mustaif I, Bansal K. English to Kashmiri Translation System: Using Example Based Machine Translation. International Journal of Innovative Research in Computer and Communication Engineering. 2015;2015(10):193–194. Available from: https://1library.net/document/y80x1w5q-english-kashmiri-translation-using-example-machine-translation-approach.html
  13. Mustaif I, Bansal K. English to Kashmiri Translation System. International Journal of Advanced Research in Computer Science and Technology. 2015;2015(2):193–194. Available from: http://ijarcst.com/doc/vol3issue2/ver2/mutasif.pdf
  14. Lone NA, Giri KJ, Bashir R. Machine Intelligence for Language Translation from Kashmiri to English. Journal of Information & Knowledge Management. 2022. Available from: https://doi.org/10.1142/S0219649222500745
  15. Thukroo IA, Bashir R. Spoken Language Identification System for Kashmiri and Related Languages Using Mel-Spectrograms and Deep Learning Approach. 2021 7th International Conference on Signal Processing and Communication (ICSC). 2021;p. 250–255. Available from: https://ieeexplore.ieee.org/document/9673212/
  16. Bhat RA, Bhat SM, Sharma DM. Towards building a Kashmiri treebank: setting up the annotation pipeline. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). 2014;p. 748–752.
  17. Lone NA, Giri KJ, Bashir R. Issues in Machine Translation—A Case Study of the Kashmiri Language. In: Machine Intelligence and Data Science Applications. (pp. 117-123) Springer Nature Singapore. 2022.
  18. Mir AT, Lawaye AA. Word Sense Disambiguation for Kashmiri Language Using Supervised Machine Learning. Proceedings of the 17th International Conference on Natural Language Processing. 2021;p. 243–245.
  19. Bashir R, Quadri S. Identification of Kashmiri script in a bilingual document image. 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013). 2013;p. 575–579. Available from: https://doi.org/10.1109/ICIIP.2013.6707658
  20. Kunchukuttan A, Roy S, Patel P, Ladha K, Gupta S, Khapra MM, et al. Experiences in resource generation for machine translation through crowdsourcing. InProceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). 2012;p. 384–391. Available from: https://aclanthology.org/L12-1127/


© 2022 Lone et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)


Subscribe now for latest articles and news.