Text Document Clustering and Classification using K-Means Algorithm and Neural Networks

Ramanpreet Kaur  and Amandeep Kaur

doi:10.17485/ijst/2016/v9i40/97722

Article

Text Document Clustering and Classification using K-Means Algorithm and Neural Networks

VIEWS 2043
PDF 392

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2016/v9i40/97722

Year: 2016, Volume: 9, Issue: 40, Pages: 1-5

Original Article

Text Document Clustering and Classification using K-Means Algorithm and Neural Networks

Ramanpreet Kaur^* and Amandeep Kaur

Department of CSE, Chandigarh University, Gharuan, Mohali - 140413, Punjab, India; [email protected]
[email protected]
*Author for correspondence
Ramanpreet Kaur
Department of CSE
Email: [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

This paper demonstrated the outcomes of the research of a number of general document clustering and classification methods. Objectives: This research improves the clustering. Its objective is to create a system which reduces the retrieval time of text documents from clusters. Method: In this paper, we propose a new method supporting clustering and classification, using k-means with feed forward neural networks using MATLAB. We use k-mean for the clustering of text documents and neural networks for classification of text documents. Findings: Earlier various techniques have come up like semi supervised models for labelled text, namely Partially Labeled Dirichlet Allocation and the Partially Labeled Dirichlet Process, genetic algorithm, Guassian distribution, hybrid genetic algorithm, fast k means global, k-means clustering. But all these techniques have their merits as well as demerits and the common thing is that these techniques are very time consuming. That is why the main aim of the work is to develop the model based on supervised as well as unsupervised techniques to achieve the similarity between documents. Improvements: To remove that time consuming problem we used neural networks for classification and k-means for clustering. We developed a model based on supervised as well as unsupervised technique to achieve the similarity between documents.
Keywords: Artificial Neural Network, Cosine Similarity and Data Mining, K-mean Algorithm, Similarity Measure Function, Text Document Clustering