A Survey on Removal of Duplicate Records in Database

M  Karthigha   and S  Krishna Anand

doi:10.17485/ijst/2013/v6i4.11

Article

A Survey on Removal of Duplicate Records in Database

VIEWS 856
PDF 917

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2013/v6i4.11

Year: 2013, Volume: 6, Issue: 4, Pages: 1-6

Original Article

A Survey on Removal of Duplicate Records in Database

M. Karthigha^1* and S. Krishna Anand²

¹ PG Student, School of Computing (CSE), SASTRA University, [email protected]
² Senior Assistant Prof, School of Computing (CSE), SASTRA University, [email protected]
*Author for correspondence
M. Karthigha
PG Student, School of Computing (CSE), SASTRA University,
Email: [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Deduplication is a task of identifying one or more records in repository that represents same object or entity. The problem is that the same data may be represented in different way in every database. While merging the databases, duplicates occur despite different schemas, writing styles or misspellings. They are called as replicas. Removing replicas from the repositories provides high quality information and saves processing time. This paper presents a thorough analysis of similarity metrics to identify similar fields in records and a set of algorithms and duplicate detection tools to detect and remove the replicas from the database.
Keywords: Similarity Metrics, Database, Indexing, Deduplication