Indian Journal of Science and Technology
Year: 2015, Volume: 8, Issue: 29, Pages: 1-9
S. Siamala Devi 1* and A. Shanmugam2
Background: An application get a global reach only if it is web based. Such types of applications are found existing in large. Storing and retrieval of information is always challenging task. Retrieving relevant data from high dimensional data is always very significant and complicated as well. Data mining plays a major role in the information retrieval process. Method: Grouping of data makes information retrieval easier. Clustering is one of the most important data mining techniques for grouping the data. Document clustering partitions the entire data into number of groups, where the data in each group should have large degree of resemblance. Findings: K-means algorithm is one of the most important portioning based algorithms which is easy to implement. Due to its time complexity, K-means can be hybridized with Harmony Search Method(HSM). HSM is a new meta-heuristic optimization method which imitates the music improvisation process. The various methodologies like Term Frequency-Inverse Document Frequency (TF-IDF), Coverage Factor (CF), Concept Factorization, Constrained based clustering have been applied on the same dataset to cluster the documents. A comparison has been made among all the above methodologies and an experimental result shows that constraint based clustering method has produced efficient clusters and it outperforms the other three methods. This constraint based clustering helps the input documents to be clustered in an effective way.
Keywords: Concept Factorization, Constraint Based Clustering, Coverage Factor, Harmony Search Method
Subscribe now for latest articles and news.