Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: 44, Pages: 1-8
Reema Aswani1 , S. P. Ghrera2 and Satish Chandra3
1Department of CSE, Ajay Kumar Garg Engineering College, Ghaziabad - 201009, Uttar Pradesh, India; [email protected] 2Department of CSE and IT, Jaypee University of Information Technology, Waknaghat -173234, Himachal Pradesh, India; [email protected] 3Department of CSE and IT, Jaypee Institute of Information Technology, Noida - 201301, Uttar Pradesh, India; [email protected]
Objectives: Detecting dataset anomalies has been an interesting yet challenging area in this front. This work proposes a hybrid model using meta-heuristics to detect dataset anomalies efficiently. Methods/Statistical Analysis: A distance based modified grey wolf optimization algorithm is designed which uses the k- Nearest Neighbor algorithm for better results. The proposed approach works well with supervised datasets and gives anomalies with respect to each attribute of the dataset based on a threshold using a confidence interval. Findings: The proposed approach works well with regression as well as classification datasets in the supervised scenario. The results in terms of number of anomalies and the accuracy using confusion matrix are depicted and have been evaluated for a classification dataset considering a minority class to be anomalous in comparison to the majority class. The results have been evaluated based on varying the threshold and ‘k’ values and depending on the data set domain and data distribution the optimal parameters can be identified and used. Application/Improvements: The proposed approach can be used for anomaly detection of datasets of different domains of supervised scenario. It can also be extended to unsupervised scenario by integrating it with K-means clustering.
Keywords: Data Mining, Grey Wolf Optimization, k-Nearest Neighbor, Machine Learning, Outlier Detection
Subscribe now for latest articles and news.