• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2021, Volume: 14, Issue: 31, Pages: 2590-2595

Original Article

TOPNMF: Topic based Document Clustering using Non-negative Matrix Factorization

Received Date:13 July 2021, Accepted Date:26 August 2021, Published Date:27 September 2021

Abstract

Objectives: This work focuses on creating targeted content-specific topicbased clusters. They can help users to discover the topics in a set of documents information more efficiently. Methods/Statistical analysis: The Non-negative Matrix Factorization (NMF) based models learn topics by directly decomposing the term-document matrix, which is a bag-of-word matrix representation of a text corpus, into two low-rank factor matrices namely Word-Topic feature Matrix(WTOM) and Document-Topic feature Matrix(DTOM). Topic clusters and Document clusters are extracted from obtained features matrices. This method does not require any statistical distribution and probability. Experiments were carried out on a subset of BBC sport Corpus. Findings: The experimental results indicate that the accuracy of TONMF clusters was observed as 100 percent. Novelty/Applications: NMF often fails to improve the given clustering result as the number of parameters increases linearly with the size of the corpus. The computational complexity of the TOPNMF is better than exact decomposition like Singular Value Decomposition (SVD).

Keywords: Topic cluster; Document cluster; Non-negative matrix factorization; K-means clustering; Word cloud

References

  1. Blei DM. Probabilistic topic models. Commun ACM. 2012;55(4):77–84. Available from: https://doi.org/10.1145/2133806.2133826
  2. Korshunova I, Xiong H, Fedoryszak M, LT. Discriminative topic modeling with logistic LDA. arXiv. 2019. Available from: http://arxiv.org/abs/1909.01436
  3. Shahbazi Z, Byun YC. Topic modeling in short-text using non-negative matrix factorization based on deep reinforcement learning. J Intell Fuzzy Syst. 2020;39(1):753–770.
  4. Wang J, Zhang XL. Deep NMF topic modeling. arXiv . 2021. Available from: http://arxiv.org/abs/2102.12998
  5. Mifrah S. Topic modeling coherence: A comparative study between LDA and NMF models using COVID’19 corpus. International Journal of Advanced Trends in Computer Science and Engineering. 2020;9(4):5756–5761.
  6. Dieng AB, Ruiz F, Blei DM. Topic modeling in embedding spaces. Trans Assoc Comput Linguist. 2020;8:439–453.
  7. Zhang F, Wang C, Trapp A, Flaherty P. A global optimization algorithm for sparse mixed membership matrix factorization. arXiv . 2016. Available from: http://arxiv.org/abs/1610.06145
  8. Feinerer I, Hornik K, Meyer D. Text Mining Infrastructure in R. J Stat Software. 2008;25(5):1–54.
  9. Chen S, Wang Y. 2021. Available from: https://acsweb.ucsd.edu/~yuw176/report/lda.pdf

Copyright

© 2021 Parimala & Gomathi. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.