Hindi-Kannada Named Entity Transliteration: Issues and Possible Solutions

Annarao Kulkarni; B  R  Srivatsa and Chetan Baji

doi:10.17485/ijst/2015/v8i27/81615

Article

Hindi-Kannada Named Entity Transliteration: Issues and Possible Solutions

VIEWS 1212
PDF 462

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2015/v8i27/81615

Year: 2015, Volume: 8, Issue: 27, Pages: 1-5

Original Article

Hindi-Kannada Named Entity Transliteration: Issues and Possible Solutions

Annarao Kulkarni^* , B. R. Srivatsa and Chetan Baji

Centre for Development of Advanced Computing (C-DAC), Bengaluru - 560100, Karnataka, India; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Indian languages belong to four language families, namely, the Indo-Aryan, Dravidian, Tibeto-Burman and the AustroAsiatic. Hindi and Kannada belong to Indo-Aryan and Dravidian family respectively and are evolved from the ancient Brahmi script and have a common phonetic structure. But the Named Entity writing convention is different due to dialectic influence, language specific rules, and other factors. Due to this, the Named Entity Transliteration from Hindi to Kannada and vice versa is not one to one character mapping. This introduces many problems in Machine Translation (MT), Cross Lingual Information Retrieval (CLIR) and Parallel corpus creation between Hindi and Kannada. The paper discusses the Named Entity Transliteration issues encountered between Hindi and Kannada during the parallel corpora creation from Hindi to Kannada for the Indian Language Corpus Initiative (ILCI) project. In this paper, we discuss cases of no exact equivalence character between Hindi and Kannada, multiple mappings, diacritic marks, loan words and language specific transliteration issues in detail and propose the possible solution to resolve the problem. At implementation level, one may make use of either Finite-State Transducers (FST) or Regular Expressions
Keywords: Hindi, Kannada, Named Entity, Regular Expressions, Transliteration