Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: 46, Pages: 1-10
T. Raghunadha Reddy1*, B. Vishnu Vardhan2 and P. Vijayapal Reddy3
1Department of IT, Vardhaman College of Engineering, Shamshabad, Hyderabad - 500018, Telangana, India; [email protected] 2Department of CSE, JNTUH College of Engineering, Karimnagar - 505501, Telangana, India; [email protected] 3Department of CSE, Matrusri Engineering College, Hyderabad - 500059, Telangana, India; [email protected]
*Author for correspondence
T. Raghunadha Reddy Department of IT, Vardhaman College of Engineering, Shamshabad, Hyderabad - 500018, Telangana, India; [email protected]
Objective: Author Profiling is a text classification technique to predict the author profiles of anonymous text. Author Profiles are the demographic characteristics of the authors like age, gender, native language, location, educational background and personality traits. This paper proposes a new model to predict the profiles of the authors such as gender and age by analyzing their writing styles on hotel reviews dataset. Method: Most of the existing approaches suffer from high dimensionality of features and capturing the relationship between the features. In this paper, a Profile specific Document Weighted approach is proposed to address the drawbacks of existing approaches. In the proposed model, the pivoted unique term normalization measure is used to calculate the weight of the terms specific to each profile group. Document weight specific to each profile group is calculated as the sum of individual term weights of the specific group. A document vector has constructed using the weight of each group of a profile in the document. Findings: An anonymous document profile has identified using the model generated by the machine learning classifier. The performance of the proposed model is evaluated with various classifiers using accuracy as a measure. The proposed approach is experimented on reviews domain to predict the gender and age group of the authors. For gender prediction, the proposed model trained on Naïve Bayes Multinomial classifier results to good accuracy of 91.50%. The logistic classifier results to good accuracy of 81.58% on the proposed model. The results achieved in this paper that outperforms most of the existing approaches. Applications: Author Profiling became popular in several information technologies enabled applications such as marketing, forensic analysis, psychology and entertainment. Using reviews dataset on hotels, the business analysts can take strategic decisions to improve the business by identifying the individual profile groups based on the customer reviews.
Keywords: Accuracy, Age Prediction, Author Profiling, Document Weight, Gender Prediction, Pivoted Unique Term Normalization
Subscribe now for latest articles and news.