CATs-Clustered k-Anonymization of Time Series Data with Minimal Information Loss and Optimal Re-identification Risk

J  S  Adeline Johnsana; A  Rajesh   and S  Kishore Verma

doi:10.17485/ijst/2016/v9i47/101081

Article

CATs-Clustered k-Anonymization of Time Series Data with Minimal Information Loss and Optimal Re-identification Risk

VIEWS 922
PDF 946

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2016/v9i47/101081

Year: 2016, Volume: 9, Issue: 47, Pages: 1-13

Original Article

CATs-Clustered k-Anonymization of Time Series Data with Minimal Information Loss and Optimal Re-identification Risk

J. S. Adeline Johnsana^{1 *}, A. Rajesh² and S. Kishore Verma³

¹Department of Computer Science and Engineering, St. Peter’s University, Avadi, Chennai - 600054, Tamil Nadu, India; [email protected] ²Department of Computer Science and Engineering, C. Abdul Hakeem College of Engineering and Technology, Melvisharam - 632509, Tamil Nadu, India; [email protected] ³ Department of Computer Science and Engineering, SCSVMV University, Kanichipuram - 631561, Tamil Nadu, India; [email protected]

*Author for correspondence
J. S. Adeline Johnsana
Department of Computer Science and Engineering, St. Peter’s University, Avadi, Chennai - 600054, Tamil Nadu, India; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Background/Objectives: Time series is a significant type of data, widely used in diverse application such as financial, medical, and weather analyses, which in-turn contain personal privacy to a great extent. Methods/Statistical Analysis: The perquisite to protect privacy of time series data is to bolster the data holder to get involved in the above applications without any privacy threats. The k-anonymization approach of time series data has picked up consideration over late years, a key requirement of such an approach is to guarantee anonymization of time series data while minimizing the information loss caused from that approach. Findings: In this article, we implemented a novel methodology called CATs (Clustered k-Anonymization of Time Series Data) that applies the idea of clustering on time series data and ensure anonymization by gaining minimized information loss within venerable utility. The fundamental perception here is that the time series data tuples that are alike, ought to be a part of one cluster, and de-identification of these tuples is furnished. We thus formulate and proposed this approach as CATs, implemented through mishmash of WEKA and ARX anonymization tool. We have executed the solution on two benchmark time series data set available in UCR, Our experimental result strives that CATs confirms to have minimal information loss ranging from 18% to 24% reduction rate when compared with existing TSA (Time Series Anonymization) approaches. Applications/Improvements: As result of our experimentation, we express that our approach can play a remarkable role in the field of financial management, Online Medical process monitoring and management etc.

Keywords: Clustering, Information Loss, k-Anonymization, Privacy Preserving Data Mining, Re-Identification Risks, Time Series Data Mining