Cosine Similarity with Centroid Implication for Text Clustering of Document Files

Anubhuti Singh; Chetna Dabas  and J  P  Gupta nbsp

doi:10.17485/ijst/2016/v9i48/105232

Article

Cosine Similarity with Centroid Implication for Text Clustering of Document Files

VIEWS 821
PDF 259

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2016/v9i48/105232

Year: 2016, Volume: 9, Issue: 48, Pages: 1-4

Original Article

Cosine Similarity with Centroid Implication for Text Clustering of Document Files

Anubhuti Singh, Chetna Dabas^* and J. P. Gupta

Jaypee Institute of Information and Technology, Noida - 201301, Uttar Pradesh, India; [email protected], [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: To address a pair wise text comparison of large dataset while making use of cosine similarity metric and adjacent method and to develop a model for parallel processing of giant data while using distributed algorithms on parallel clusters. Methods/Statistical Analysis: This works makes use of K-means algorithm based on map-reduce on document files with effective number of clusters in a Java environment. This work reflects an approach to classify text documents using feature selection method makes use of cosine similarity method. Within fixed number of iterations, efficient numbers of clusters have been implemented. The implementation has been carried out in Java environment. Findings: The proposed work reflects an approach to classify text documents using feature selection method. Application/Improvements: While using cosine similarity methods, the results retrieved are quite improved and acceptable.

Keywords: Cosine Similarity, Document Files, Text Clustering