Framework of Data Deduplication: A Survey

A  Venish   and K  Siva Sankar

doi:10.17485/ijst/2015/v8i26/80754

Article

Framework of Data Deduplication: A Survey

VIEWS 1362
PDF 292

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2015/v8i26/80754

Year: 2015, Volume: 8, Issue: 26, Pages: 1-7

Original Article

Framework of Data Deduplication: A Survey

A. Venish^1* and K. Siva Sankar²

¹Computer Science and Engineering, Noorul Islam University, Kumarakoil – 629180, Tamil Nadu, India; [email protected]
² Information Technology, Noorul Islam University, Kumarakoil – 629180, Tamil Nadu, India; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

To understand the concept and framework of de-duplication process along with application, various methods and technologies involved during the each level implementation of this process and about the Limitations. Different chunking algorithm such as fixed, variable and content aware chunking methods are used to decide the chunk size of deduplication. To avoid the single point failure in the distributed system, cluster model with parallel process can be used. Comparing with and without deduplication, we can save up to 75% of storage space in the backup system by using with deduplication. Digital data increase happens in all cloud deployment models and this requires more storage capacity, more costs, manpower and more time to handle data information like backup, replication and disaster recovery, more bandwidth utilization in transmitting the data across the network. If we handle the data effectively like remove the redundant data before storing into the storage device we can avoid data handling overhead and we can improve the system performance. By using data deduplication concept we can achieve the above results. The variable chunking method including content aware process yields good throughput in the deduplication system compare with fixed and whole file chunking method. Inline process avoids extra required spaces and increase system performance. Cluster model eliminates single point failure in the distributed deduplication system. Deduplications save more storage spaces, avoid tohandle unnecessary data over head and provides less resources utilization with minimal cost.
Keywords: Chunking, Cluster Deduplication, Data Deduplication, Duplicate Detection, Fingerprint Calculation