An Empirical Analysis on Effect of Data Expansion for Clustering Low Dimensional Data

Smita Prava Mishra; Debahuti Mishra   and Srikanta Patnaik  nbsp

doi:10.17485/ijst/2016/v9i3/71221

Article

An Empirical Analysis on Effect of Data Expansion for Clustering Low Dimensional Data

VIEWS 889
PDF 233

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2016/v9i3/71221

Year: 2016, Volume: 9, Issue: 3, Pages: 1-21

Original Article

An Empirical Analysis on Effect of Data Expansion for Clustering Low Dimensional Data

Smita Prava Mishra¹, Debahuti Mishra^2* and Srikanta Patnaik²

¹Computer Science and Information Technology, Institute of Technical Education and Research, Siksha O Anusandhan University, Bhubaneswar – 751003, Odisha, India ²Computer Science and Engineering, Institute of Technical Education and Research, Siksha O Anusandhan University, Bhubaneswar – 751003, Odisha, India; [email protected]

*Author For Correspondence
Debahuti Mishra
Computer Science and Engineering, Institute of Technical Education and Research, Siksha O Anusandhan University, Bhubaneswar – 751003, Odisha, India; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

The researchers of the data mining domain presume that the study of traditional clustering techniques is saturating day by day. But, a deep insight into those techniques unfolds many silhouettes which could lead to many more applications in diverged domains. In clustering, the attributes of the data provide the information needed for data segregation. There may exist some real world data with less number of attributes but more information contained in them and may be of interest for some applications. Because of less number of attributes, the data may not be well separated by any of the clustering techniques. Data expansion techniques are methods for constructing more number of attributes from less number of attributes. With the application of these techniques, an expanded data set may be reconstructed from a given data set during data preprocessing. The current work pronounces the fact that, the expanded data at times yield better clustering results than the real data. This paper is an attempt to empirically evaluate and analyze the effects of data expansion on clustering results where validity of the results are established through internal indexing techniques and probabilistic validation measures.

Keywords: Cluster Analysis, Cluster Validity, Data Expansion, Internal Indexing, Probabilistic Measures