Indian Journal of Science and Technology
Year: 2015, Volume: 8, Issue: 22, Pages: 1-6
S. Gayathri1* , M. Mary Metilda2 and S. Sanjai Babu1
Background/Objective: A high dimensional data is a dataset that ranges from a few to a hundreds of dimensions. Clustering such datasets needs an efficient algorithm such as Proclus but the algorithm has a drawback of ignoring cluster with small data points. So the proposed paper gives an ensemble of clustering that combines technique of two clustering algorithms to achieve a quality cluster of even small data points. Methods/Statistical Analysis: The research paper adapts a novel method of implementing a density based approach over a Proclus algorithm to cluster even small data points. These combined algorithms are tested using synthetic datasets. The Proclus algorithm is modified at a specific point where the density based algorithm is implemented. Findings: The results of the proposed algorithm are found to contain more clusters than mere Proclus algorithm does. The results is as such because in Proclus clustering the data point whose size is small are ignored so that only clusters with large number of data points can exists. However after the involvement of the shared nearest neighbor density based algorithm even the small data points are clustered which paves way for a more accurate and an efficient clustering process especially in a high dimensional data. Applications/Improvements: The application is a combination of two efficient algorithms but implemented in a simple way thereby reducing the complexity of the algorithm. The proposed technique can be applied on all high dimensional datasets irrespective of their sizes and shapes.
Keywords: Density based Approach, High Dimensional Data, Proclus, SNN Algorithm
Subscribe now for latest articles and news.