• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2020, Volume: 13, Issue: 39, Pages: 4189-4201

Original Article

Edit distance-based search approach for retrieving element-wise prosody/rhymes in Hindi-Urdu poetry

Received Date:25 August 2020, Accepted Date:24 September 2020, Published Date:09 November 2020


Background: Prosody (rhyming words) is a connatural element of poetry, throughout its reach, across thousands of languages in the world. Since medieval era, the Indic poetry (principally the Hindi/Urdu poetry) has created an impactful flamboyance w.r.t the subjects, styles, and other creative aspects in poetry. Besides the message of heartfelt poetry, we see the Qafiya (i.e., rhyming words) is the core element, without which we may not consider anything Hindi/Urdu poetry but merely a piece of writing; alongside it, Radif (i.e., a phrasal suffix to qafiya) is also considered next to the intrinsic part in Ghazals. In this regard, the contributions of this paper are one–the development of an optimal technique for the prosodic (qafiya) suggestions/retrieval in Hindi/Urdu poetry; and two–the qafiya suggestions based on the attached subsequent radif. Methods: The work in this paper involves usage of a 13.46 M tokens tri-script corpus of poetry. Instead of phonetic value matching, the proposed methodology employs four different Edit Distances (i.e., Levenshtein, Damerau–Levenshtein, Jaro–Winkler, and Hamming distance) as the comparison measures for prosodic suggestions. Findings: The proposed work shows better results in comparison to ‘Qaafiya Dictionary’ powered by rekhta.org. Moreover, w.r.t the inter-metric similarity and running time Jaro–Winkler appears to be the most optimal algorithm for the rhyme suggestion, whereas the Levenshtein distance is the laziest technique. Novelty/Applications: This work benefits researchers of Indic natural language processing for lexical look-ups and analysis of creative literature, especially poetry.

Keywords: Natural language processing, information retrieval, poetry, prosody, Hindi, Urdu


  1. Hashmi R. اصنافِ ادب/ Asnaaf-e-Adab. Lahore. Sang-e-Meel Publishers. 1991.
  2. Crowe RJ, Delmore S, Hall WJ. American poetry at mid-century. 1958.
  3. Grishman R. Computational linguistics: An introduction. Cambridge University Press. 1986.
  4. Büttcher S, Clarke LAC, Cormack GV. Information retrieval: Implementing and evaluating search engines. MIT Press. 2016.
  5. Wen TH, Gasic M, Mrksic N, Su PH, Vandyke D, Young S. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. 2015.
  6. Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. 2018.
  7. Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, et al. A brief survey of text mining: Classification, clustering and extraction techniques. arXiv. arXiv:1707.02919. 2017.
  8. Kowsari K, Meimandi KJ, Heidarysafa M, Mendu S, Brown D, Brown. Text Classification Algorithms: A Survey. Information. 2019;10(4). Available from: https://dx.doi.org/10.3390/info10040150
  9. Kachru Y. Hindi-urdu. The Major Languages of South Asia, The Middle East and Africa. 2003;p. 52–68.
  10. Sahitya Akademi A History of Urdu literature. SouthAsiaBooks. 1993.
  11. Busch A. Poetry of kings: The classical Hindi literature of Mughal India. Oxford University Press. 2011.
  12. Eberhard DM, Simons GF, Fennig CD. Ethnologue: Languages of Asia. SIL International. 2019.
  13. Daud A, Khan W, Che D. Urdu language processing: a survey. Artificial Intelligence Review. 2017;47(3):279–311. Available from: https://dx.doi.org/10.1007/s10462-016-9482-x
  14. Kuiper K. The Culture of India. Britannica Educational Publishing. 2010.
  15. Basu M. The Rhetoric of Hindutva. Cambridge University Press. 2017.
  16. García AIM. El gramático Pompeyo y el legado sintáctico de Servio. L'antiquité classique. 2013;82:69–90. Available from: https://dx.doi.org/10.3406/antiq.2013.3827
  17. Rao C, Vaid J, Srinivasan N, Chen HC. Orthographic characteristics speed Hindi word naming but slow Urdu naming: evidence from Hindi/Urdu biliterates. Reading and Writing. 2011;24(6):679–695. Available from: https://dx.doi.org/10.1007/s11145-010-9256-9
  18. Taqi AS. رموزِ شاعری / ِRumooz-e-Shairi. Alqamar Enterprises. 2003.
  19. Reinöhl U. Grammaticalization and the rise of configurationality in Indo-Aryan. (Vol. 20) Oxford University Press. 2016.
  20. Khansir AA, Mozafari N. The impact of persian language on indian languages. Theory & Practice in Language Studies. 2014;4(11).
  21. Deo A, Kiparsky P. Poetries in contact: Arabic, persian, and urdu. Frontiers of comparative metrics. 2011;p. 147–173.
  22. Chand KK. Urdu Ghazals: An Anthology from 16th to 20th Century. Sterling Publishers Pvt. Ltd. 1995.
  23. Muhammad A. ‘Iqbal’. بانگِ درا/Baang-e-Dara. (Vol. 1) 1905.
  24. Khalil MAK. Bang-i-dara (the call of the marching bell) translation. Tayyib Printers. 1991.
  25. Urooj S, Mumtaz B, Hussain S. Urdu intonation. Journal of South Asian Linguistics. 2019;10.
  26. Butt M, Jabeen F, Bögel T. Verb cluster internal wh-phrases in urdu: Prosody, syntax and semantics/pragmatics. Linguistic analysis. 2016;40:445–487.
  27. Jabeen F, Braun B. Production and perception of prosodic cues in narrow & corrective focus in urdu/hindi. 9th International Conference on Speech Prosody. 2018;p. 30–34.
  28. Hussain SS, . National University of Computing and Emerging Sciences –FAST Prosody in urdu poetry - a phonological approach.
  29. FWP, Khaliq AK. FP, Khaliq KA., eds. Urdu Meter: A Practical Handbook. 1987.
  30. Aruuz. Available from: https://aruuz.com/taqti/.com (accessed )
  31. Rekhta. Available from: https://rekhta.org (accessed )
  32. Richardson L. Beautiful soup documentation. 2007.
  33. Bookstein A, Kulyukin AV, Raita T. Generalized hamming distance. Information Retrieval. 2002;5(4):353–375.
  34. Levenshtein VI. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady. 1966;10(8).
  35. Damerau FJ. A technique for computer detection and correction of spelling errors. Communications of the ACM. 1964;7(3):171–176. Available from: https://dx.doi.org/10.1145/363958.363994
  36. Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Cambridge university press. 2008.
  37. Lochtefeld GJ. The Illustrated Encyclopedia of Hinduism. (Vol. 1) The Rosen Publishing Group, Inc. 2001.
  38. Deo AH. The metrical organization of Classical Sanskrit verse. Journal of Linguistics. 2007;43(1):63–114. Available from: https://dx.doi.org/10.1017/s0022226706004452


© 2020 Khan et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee).


Subscribe now for latest articles and news.