• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2015, Volume: 8, Issue: Supplementary 9, Pages: 1-5

Original Article

Using Part-of-Speech Sequences Frequencies in a Text to Predict Author Personality: a Corpus Study


Objectives: The aim of the paper is to examine the efficiency of using of parts-of-speech (POS) bigram frequencies to address the problem of personality prediction from text. Methods/Analysis: 96 texts were used for the study which were randomly selected from a corpus of Russian students’ essays “Personality”. Using NLP methods frequencies of POS bigrams in each text (227 types of bigrams were overall identified) were computed, bigrams were then selected which are found in no less than 75% of the analyzed texts. Correlations between POS bigrams frequencies in texts and author gender and personality traits are computed. Findings: Some researchers report consistently positive contribution of POS n-grams in construction of models for personality prediction from text. But these conclusions were drawn based on the analysis of English-language texts. Our finding confirms the efficiency of POS bigrams in predicting personality from texts in Russian. Correlations of POS bigrams frequencies and characteristics of the authors of the texts (gender, personality traits measured with McCrae and Costa questionnaire) were computed. Gender was correlated with frequency of prep_noun bigram in the text, males typically score more on this parameter. For neuroticism the correlations were identified with adj-noun, noun-prep, prep-noun bigrams. There were also correlations with the Openness parameter and frequencies of noun-prep bigram. There were also weak correlations between the scores on Extraversion and frequencies of pers-vfin bigram, the scores on Agreeableness and frequencies of pers-vfin bigrams and ptcl-vfin bigrams. The paper points out that the resulting dependencies should be interpreted based on psychology, psycholinguistics and neurolinguistics data. Novelty of the study: To our knowledge, the current study has been the first one to deal with the frequencies of certain POS bigrams as parameters for written text author profiling for Russian-language texts. Conclusion/Application: The study conducted on Russian texts confirmed early finding in English regarding usefulness of POS n-grams in author profiling task. Further investigations on different text corpora are needed.
Keywords: Author, Authorship Attribution, Corpus Linguistics, Part-of-Speech Bigrams, Personality Prediction from Text, Text


Subscribe now for latest articles and news.