• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2020, Volume: 13, Issue: 27, Pages: 2711-2719

Original Article

Textual Mining- Evaluation of Mann Ki Baat Repository

Received Date:09 June 2020, Accepted Date:14 July 2020, Published Date:31 July 2020

Abstract

Background and Objective: As computers and the Internet are broadly utilized in nearly every region, numerous computerized text data is produced each day. It becomes a fundamental task to explore and effectively search such massive data. The main aim of the present study is to emphasize the recurrence of topics and identifying main ideas from a popular monthly addressing radio program Mann Ki Baat by using topic modeling technique. Data and Method: The present study utilizes the unstructured data of Mann ki Baat from January 2020 to March 2020, collected from the PMINDIA website. This program was initiated by the Honorable Prime Minister of India, Mr. Narendra Modi. This examination uses a popular technique Topic modeling based on LDA (Latent Dirichlet Allocation). Findings: The results show that the method automatically extracts the main ideas and issues discussed. Besides it provides information about the most likely topics and themes discussed in each month that left an impact on people and helped in raising awareness. Novelty: This is a first study of the application of popular technique topic modelling on Mann ki Baat. Further, this is the first attempt to extract the ideas discussed in a social campaign using a statistical model.

Keywords: Unstructured data; preprocessing; topic modelling; latent dirichlet allocation (LDA); mann ki baat

References

  1. Wang J, Geng X, Gao K. Study on topic evolution based on text mining. In: Proc - 5th Int Conf Fuzzy Syst Knowl Discov FSKD. (Vol. 2, pp. 509-513) 2008.
  2. Patel FN, Soni NR. Text mining: A Brief survey. Int J Adv Comput Res. 2012;2:243–248.
  3. Mazarura J, Waal AD, Kanfer F. Topic Modelling for Short Text. Available from: http://hdl.handle.net/2263/50694
  4. Griffiths TL, Steyvers M, Blei DM. Integrating topics and syntax. Advances in Neural Information Processing Systems. 2005.
  5. Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003. Available from: https://doi.org/10.1016/b978-0-12-411519-4.00006-9
  6. Blei DM, Jordan MI. Modeling Annotated Data. SIGIR Forum. 2003.
  7. Rosen-Zvi M, Chemudugunta C, Griffiths T, Smyth P, Steyvers M. Learning author-topic models from text corpora. ACM Transactions on Information Systems. 2010;28(1):1–38. Available from: https://dx.doi.org/10.1145/1658377.1658381
  8. Saxena S, Ki M, Baat. Radio as a Medium of Communication by the Indian Premier, Narendra Modi. . Asian Polit Policy. 2016. Available from: https://doi.org/10.1111/aspp.12267
  9. Munková D, Munk M, Vozár M. Data Pre-processing Evaluation for Text Mining: Transaction/Sequence Model. In: Procedia Computer Science. (Vol. 18, pp. 1198-1207) Elsevier BV. 2013.
  10. Blei D, Carin L, Dunson D, et al. Probabilistic Topic Models. In: IEEE Signal Processing Magazine, 6. (Vol. 27, pp. 55-65).
  11. Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, et al. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications. 2019;78(11):15169–15211. Available from: https://dx.doi.org/10.1007/s11042-018-6894-4
  12. Moubayed NA, McGough S, Hasan BAS, B. Beyond the topics: how deep learning can improve the discriminability of probabilistic topic modelling. PeerJ Computer Science. 2020.
  13. Zhao W, Chen JJ, Perkins R, Liu Z, Ge W, Ding Y, et al. A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinformatics. 2015;16(S13). Available from: https://dx.doi.org/10.1186/1471-2105-16-s13-s8
  14. Hu DJ. Latent Dirichlet Allocation for Text, Images, and Music. Available from: http://cseweb.ucsd.edu/~dhu/docs/research_exam09.pdf

Copyright

© 2020 Kandukuri, Haragopal. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.