Year: 2022, Volume: 15, Issue: 37, Pages: 1868-1875

Original Article

Understanding Vaccine Hesitancy with Application of Latent Dirichlet Allocation to Reddit Corpora

Received Date:25 March 2022, Accepted Date:04 September 2022, Published Date:26 September 2022


Objectives: To understand the vaccine-hesitant behavior among social media user cohorts; to identify the underlying factors that contribute to vaccine hesitancy; to help policymakers make informed decisions to improve the success rate of vaccination campaigns. Methods: Latent Dirichlet Allocation (LDA)—a popular topic modeling technique—was used to extract topics from the Reddit corpus on vaccine hesitancy discussion. The corpus was extracted from Reddit’s API using PRAW—The Python Reddit API Wrapper. The corpus contained 2996 comments, retrieved from the following subreddits:r/askreddit, r/antivax, r/antivaccine, and r/AntiVaxxers; determinants of Vaccine hesitancy were generated from the corpus using Standard LDA and Mallet LDA models. Findings: By applying Latent Dirichlet Allocation, we were able to identify the underlying factors that contribute either directly or indirectly toward vaccine-hesitant behavior. Some of the interesting factors of contribution include, but are not limited to rapture, depopulation agenda, the immigrant crisis at the Southern US border, etc. Given that the dataset we used contained a majority of input from people living in the United States, the results are rational; however, the same factors may or may not be contributors worldwide. The topics generated by the standard LDA were less precise and comprehensible than the topics generated by the mallet LDA model. Although a number of contributions have been made in this specific area i.e., understanding the vaccine-hesitant behavior, none report how political and religious factors contribute to the outcome. At a surface level, even though it is well-known that religious and political factors contribute to vaccine hesitancy, our unique Reddit corpus and the methodology as mentioned earlier let us identify fascinating and novel factors that have not been reported elsewhere. Novelty: Research to identify the factors that contribute toward vaccine hesitancy is fairly common, especially while the coronavirus pandemic was at its peak. Existing research works predominantly use surveys and other traditional methodologies to identify the factors that contribute toward the said phenomena. The application of Natural language Processing, viz., Latent Dirichlet Allocation could bring out the best latentvariables which cannot be identified using the aforementioned methodologies. This research is fairly novel by the methodology adopted and by the results obtained.

Keywords: Vaccine hesitancy; Coronavirus; Latent Dirichlet Allocation; Bayesian Statistics; Reddit


