Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: 26, Pages: 1-7
Kamal Malik1*and Harsh Sadawarti2
Objectives: Curse of Dimensionality and the attribute relevance is the matter of great concern now these days while dealing with the higher dimensional data sets or Big Data, especially to detect the projected outliers. The objective of this research paper is to construct a Robust and a scalable model to prominently highlight the higher dimensional outliers in an effective and an efficient manner. Methods/Analysis: In order to detect the projected outliers, an algorithm EKFFKMeans with a hybrid approach is constructed using two important methodologies- Extended Kalman Filter (EKF) and Fuzzy K-Means. EKF is used to linearize the higher dimensional data by estimating the current mean and covariance by enhancing the Kalman gain and then fuzzy K-Means confirms the outlying property of each data instance and categorizes them in an effective and an efficient way using the membership label. Findings: A model EKFFK-Means is constructed that further creates 30 clusters from the complete data set to detect the projected outliers and various parameters like accuracy, cluster validity, True positive rate, False positive rate , robustness and cluster quality are calculated. Improvements: This algorithm is further compared with HPStream and CLUStream and is proved better against various parameters.
Keywords: Clustering, Projected Outliers, Robustness, Scalability, Unsupervised
Subscribe now for latest articles and news.