Design and Implementation of GPU-based File Similarity Evaluation System

Yeong Dae Kim; Byung Kwan Kim; Sung Bong Jang; Saang Yong Uhmn   and Young Woong Ko   nbsp

doi:10.17485/ijst/2016/v9i20/94694

Article

Design and Implementation of GPU-based File Similarity Evaluation System

VIEWS 775
PDF 241

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2016/v9i20/94694

Year: 2016, Volume: 9, Issue: 20, Pages: 1-6

Original Article

Design and Implementation of GPU-based File Similarity Evaluation System

Yeong-Dae Kim¹ , Byung-Kwan Kim¹ , Sung-Bong Jang² , Saang-Yong Uhmn¹ and Young Woong Ko^1*

¹Department of Computer Engineering, College of Information and Electronic Engineering, Hallym University, Chuncheon, Gangwon, 200-702, Republic of Korea; [email protected], [email protected], [email protected], yuko @hallym.ac.kr ²Department of Computer Software Engineering, Kumoh National Institute of Technology, 61 Daehak-ro, Gumi, Kyoung-Buk, 730-701, Republic of Korea; [email protected]

*Author for correspondence
Young Woong Ko Department of Computer Engineering, College of Information and Electronic Engineering, Hallym University, Chuncheon, Gangwon, 200-702, Republic of Korea; yuko @hallym.ac.kr

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Background/Objectives: Recently, storage systems and backup systems are popularly used and the number of duplicated data is increased drastically. To minimize data storage size and efficient use of network bandwidth, we proposed deduplication systems and file similarity measurement schemes with GPGPU scheme. The GPGPUs are applied to file similarity measurement for computation speedup. Methods/Statistical Analysis: To cope with the problem accompanying the parallelization of the measurement, we compare two implementations with shared memory and preprocessing. In addition, we propose an alternative to Rabin fingerprinting algorithm to lessen the computational burden of the algorithm to the GPUs. We compare the performance of the systems in time elapsed for several files. Findings: First, we found through experiments that the preprocessing was slightly faster than the shared memory scheme for the overlapped region of consecutive data segments which were assigned to different cores. This region should be shared by two cores for fingerprinting. By adapting GPGPU parallelization with the preprocessing technique for file similarity measurement, the proposed system outperformed the systems with a multi-core CPU. Also, it gets faster for the bigger file. In addition, we made the system three times faster by adapting an alternative to Rabin fingerprinting algorithm. It eliminates the computational burden of the algorithm and provides comparable results to the system with the latter. Improvements: The procedure will be beneficial to de-duplication system in determining file similarity and finding duplicated regions of two files. We achieved speedup in the measurement of file similarity by parallelization on GP-GPUs with two methods for overlaps of consecutive data segments and an alternative fingerprinting algorithm.

Keywords: File Similarity, Fingerprinting, Parallelization on GPUs