On Authorship Attribution of Telugu Text

S  Nagaprasad; N  Krishnaveni; J  K  R  Sastry  and A  Vinayababu

doi:10.17485/ijst/2016/v9i35/98735

Article

On Authorship Attribution of Telugu Text

VIEWS 2089
PDF 721

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2016/v9i35/98735

Year: 2016, Volume: 9, Issue: 35, Pages: 1-7

Original Article

On Authorship Attribution of Telugu Text

S. Nagaprasad^1*, N. Krishnaveni² , J. K. R. Sastry³ and A. Vinayababu⁴

¹ Department of Computer Science, S.R.R.G.A.S.C. Karimnagar, Telangana, India; [email protected]
² Government Degree College for Women, Karimnagar - 505001, Telangana, India
³ Department of ECM, K L University, Vijayawada - 522502, Andhra Pradesh, India; [email protected]
⁴ Department of Computer Science and Engineering, JNTU, Hyderabad - 500085, Telangana, India; [email protected]
*Author for correspondence
Nagaprasad
Department of Computer Science
Email:[email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Background/Objectives: Authorship Attribution is one of the text classification methods. It is useful to find out the author with a given set of text based on author writing style. Methods/Statistical Analysis: Various methods that include Decision Tree, K-Nearest Neighbor, Naive Bayes and Support Vector Machine) have been used to find text patterns that exist within a text based database. The classification of the text patterns is deterministic whereas the authorship attribution to the text is un-deterministic. This paper presents a method that recognize a text pattern by using the authorship features using different phases of processing which include prior processing, extracting the features, feature selection, classifying the features and then finally leading to finding the author. Findings: The task of Authorship Attribution can be imposed to a range of exercises such as Scientific Analysis, Stealing Recognition and Authorship Recognition. Exploration in the part of Authorship Attribution is in view for more than 100 centuries, but the completed consequences were unacceptable. A range of provocations have been referred which include information collections, tokenizing of the content, applying Natural Language Tools, suitability of categorization methods and reorganization of a range of appearance which can discriminate one writer from the other writers. From the prevailing analysis, it can be concluded that the pronounced accruement are individual circumstance scene of situations, since it may not be useful to other consequences of Authorship Attribution associations. From the acquired inputs, it is recognized that the word “unigram” constituent acquired the finest record when assessed with all additional appearances for all classifiers. From among different classifiers, Support Vector Machine realized the best result when evaluated in conjunction with different classifiers such as Decision Tree, K-Nearest Neighbor and Naive Bayes classifiers. Application/Improvements: This authorship attribution method is used to find out authorship of vernacular language which in this case is TELUGU.
Keywords: Natural Language Processing, Text Classification