Indian Journal of Science and Technology
Year: 2015, Volume: 8, Issue: 28, Pages: 1-8
S. Suguna1*, V. Sundaravadivelu2 and B. Gomathi1
1 Department of Computer Science, Sri Meenakshi Govt. Arts College for Women (A), Madurai – 625002, Tamil Nadu, India; [email protected]; [email protected]
2 PG and Research Department of Computer Science, Thiru A. Govindasamy Govt. Arts College, Tindivanam – 604002, Tamil Nadu, India; [email protected]
Background/Objectives: In this paper we have analyzed various issues with clustering and text mining. The collected documents are preprocessed and grouped using our proposed new algorithm based on position method. We proved our proposed color based constraint clustering algorithm out performs than K-Means and SOM algorithms in terms of time and reliability factors. Methods/Statistical Analysis: We identified the problem after analyzing the existing works with the help of articles from reputed journal papers and national and International level conferences. We proposed the new methodology for document grouping process, and color based constraint clustering process. Clustering can be considered as the most important semi-supervised learning problem which deals with finding a structure in a collection of unlabelled data. In this work the collected documents are preprocessed by stop word removal and stemming process and then the words are grouped according to their similarity using color code constraints. Performances of SOM and Kmeans, and color based constraint algorithms are analyzed for any kind of text document collections. Findings: In this work our proposed color based constraint (CBC) algorithm, SOM and K-Means algorithms performances are compared against time based frequency and reliability of retrieved documents. Here, the time needed to process the number of documents is analyzed. Reliability of retrieved documents can be made by using the number documents and the frequency measurement. We proved our proposed color based constraint clustering algorithm out performs than K-Means, and SOM algorithms in terms of time and reliability. Application/Improvements: Our work is useful for efficient information retrieval process. In future this work can be extended to maximize the grouping of words with minimum latency and one can also extend this work to develop an algorithm for maximize the grouping(clustering) of words in a document with color based constraints to increase the clustering performance for efficient text mining.
Keywords: Color Based Constraint, Clustering, Information Retrieval, Semi_Supervised Clustering Technique, Text Mining
Subscribe now for latest articles and news.