Indian Journal of Science and Technology
Year: 2015, Volume: 8, Issue: 27, Pages: 1-10
Amandeep Verma* and Amandeep Kaur Gahier
Department of Computer Science and Engineering, Punjabi University Regional Campus for IT and Management, Mohali - 160062, Chandigarh, India; [email protected]
Topic Modeling refers to the act of discovering the theme of a document. Theme of a document provides an abstract view of the set of subjects (topics) addressed in the document. So, documents can be classified, arranged and searched according to their subjects using Topic Modeling. Topic Modeling has been the area of interest of most of the researchers from the fields of Text Mining, Natural Language Processing, and Machine Learning etc. Literature shows some techniques for generating theme out of a document. Most of the suggested Topic Models have been designed for English language. For Indian languages, particularly in Punjabi Language, such topic modeling is lacking in the literature. Although some Topic Summarization, Topic Tracking and Keyword Extraction systems has been developed for Punjabi Language, yet the technique of Topic Modeling is quite different from them. The paper presents a topic model for E-news in Punjabi Language. The idea of this topic model has been taken from the simplest and most basic probabilistic topic model; named LDA (Latent Dirichlet Allocation). This topic model finds the topics and their respective proportions present in the news text given as input to it. The theme generation process needs a Topic List Corpus at the backend of Topic Model. Such Corpus has been built containing Punjabi words commonly occurring in news articles, classified under different topic lists. The topic model has been tested on more than 1000 news articles for verification of its exactness. The values of various parameters attesting the quality of outputs given by topic model are quite satisfactory.
Keywords: Keyword Extraction, LDA, NLP, Probabilistic Topic Models, Topic List Corpus, Topic Lists, Topic Modeling, Topic Summarization, Topic Tracking
Subscribe now for latest articles and news.