• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2015, Volume: 8, Issue: 27, Pages: 1-6

Original Article

Toward Normalizing Romanized Gurumukhi Text from Social Media

Abstract

Roman characters are used to write Indian language text on social media like facebook and twitter. Processing this text for NLP applications is not a trivial task. This text needs to be transliterated as well as conversion to canonical form. This paper discusses the various issues involved in normalizing such text in the domain of Punjabi Language. An algorithm is proposed to normalize Punjabi language text which is written using roman script. The proposed algorithm tries to find out all possible combinations and then filter using n-gram language model.
Keywords: Text Normalization, Transliteration, Social Media Text

DON'T MISS OUT!

Subscribe now for latest articles and news.