Indian Journal of Science and Technology
Year: 2015, Volume: 8, Issue: 27, Pages: 1-6
Jagroop Kaur and Jaswinder Singh*
Department of Computer Engineering, Punjabi University, Patiala - 147002, Punjab, India; [email protected]
Roman characters are used to write Indian language text on social media like facebook and twitter. Processing this text for NLP applications is not a trivial task. This text needs to be transliterated as well as conversion to canonical form. This paper discusses the various issues involved in normalizing such text in the domain of Punjabi Language. An algorithm is proposed to normalize Punjabi language text which is written using roman script. The proposed algorithm tries to find out all possible combinations and then filter using n-gram language model.
Keywords: Text Normalization, Transliteration, Social Media Text
Subscribe now for latest articles and news.