Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: 12, Pages: 1-11
Urvashi Garg* and Vishal Goyal
Punjabi University, Patiala, NH 64, Urban Estate Phase II, Patiala - 147002, Punjab, India; [email protected], [email protected]
*Author for correspondence
Punjabi University, Patiala, NH 64, Urban Estate Phase II, Patiala - 147002, Punjab, India; [email protected]
Objective: The objective of this paper is to present an automated plagiarism detection software tool called Maulik. There are many plagiarism detection tools available for English text. Maulik detects plagiarism in Hindi documents. Method: Maulik divides the text into n-grams and then matches it with the text present in repository as well as with documents present online. Preprocessing techniques such as stop word removal and stemming has been used. The best value of n-gram for finding out the similarity of two Hindi documents has also been found out. Cosine similarity has been used for finding the similarity score. Findings: Similarity score of 96.3 has been achieved which is higher as compared to the existing Hindi plagiarism detection tools such as Plagiarism checker, Plagiarism finder, Plagiarisma, Dupli checker, Quetext. These tools compared only exact matches ignoring the language specific constraints whereas Maulik is capable of finding plagiarism if root of a word is used or a word is replaced by its synonyms. Application: Maulik is a software tool which discourages plagiarism as well as motivates the writing skills of people.
Keywords: Cosine Similarity, Plagiarism, Stemming, Stop Word, Synonyms
Subscribe now for latest articles and news.