Indian Journal of Science and Technology
Year: 2021, Volume: 14, Issue: 31, Pages: 2590-2595
R Parimala1*, K Gomathi2
1Assistant Professor, Department of Computer Science, Periyar EVR College, Tiruchirapalli, 620023, Tamil Nadu, India
2Research Scholar, Department of Computer Science, Periyar EVR College, Tiruchirapalli, 620023, Tamil Nadu, India
Email: [email protected]
Received Date:13 July 2021, Accepted Date:26 August 2021, Published Date:27 September 2021
Objectives: This work focuses on creating targeted content-specific topicbased clusters. They can help users to discover the topics in a set of documents information more efficiently. Methods/Statistical analysis: The Non-negative Matrix Factorization (NMF) based models learn topics by directly decomposing the term-document matrix, which is a bag-of-word matrix representation of a text corpus, into two low-rank factor matrices namely Word-Topic feature Matrix(WTOM) and Document-Topic feature Matrix(DTOM). Topic clusters and Document clusters are extracted from obtained features matrices. This method does not require any statistical distribution and probability. Experiments were carried out on a subset of BBC sport Corpus. Findings: The experimental results indicate that the accuracy of TONMF clusters was observed as 100 percent. Novelty/Applications: NMF often fails to improve the given clustering result as the number of parameters increases linearly with the size of the corpus. The computational complexity of the TOPNMF is better than exact decomposition like Singular Value Decomposition (SVD).
Keywords: Topic cluster; Document cluster; Non-negative matrix factorization; K-means clustering; Word cloud
© 2021 Parimala & Gomathi. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)
Subscribe now for latest articles and news.