• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2015, Volume: 8, Issue: 27, Pages: 1-6

Original Article

Toward Normalizing Romanized Gurumukhi Text from Social Media


Roman characters are used to write Indian language text on social media like facebook and twitter. Processing this text for NLP applications is not a trivial task. This text needs to be transliterated as well as conversion to canonical form. This paper discusses the various issues involved in normalizing such text in the domain of Punjabi Language. An algorithm is proposed to normalize Punjabi language text which is written using roman script. The proposed algorithm tries to find out all possible combinations and then filter using n-gram language model.
Keywords: Text Normalization, Transliteration, Social Media Text


Subscribe now for latest articles and news.