Total views : 837

A Machine Learning based Classification for Social Media Messages


  • School of Computing, SASTRA University, Thanjavur – 613 401, Tamil Nadu, India


A social media is a mediator for communication among people. It allows user to exchange information in a useful way. Twitter is one of the most popular social networking services, where the user can post and read the tweet messages. The tweet messages are helpful for biomedical, research and health care fields. The data are extracted from the Twitter. The Twitter data cannot classify directly since it has noisy information. This noisy information is removed by preprocessing. The plain text is classified into health and non-health data using CART algorithm. The performance of classification is analyzed using precision, error rate and accuracy. The result is compared with the Naïve Bayesian and the proposed method yields high performance result than the Naïve Bayesian. It performs well with the large data set and it is simple and effective. It yields high classification accuracy and the resulting data could be used for further mining.


CART, Classification, Decision Tree, Machine Learning, Twitter

Full Text:

 |  (PDF views: 801)


  • Aramaki E, Maskawa S, Morita M. Twitter catches the flu:
  • detecting influenza epidemics using twitter. Proceedings of
  • the conference on Empirical Methods in Natural Language
  • Processing (EMNLP’11); 2011; Stroudsburg, PA, USA:
  • Association for Computational Linguistics; p. 1568–76.
  • Collier N, Doan S. Syndromic classification of twitter messages.
  • Electronic Healthcare. 2012; 91:186–95.
  • Corley C, Cook D, Mikler A, Singh K, Arabnia HR. Using
  • web and social media for influenza surveillance. Advances
  • in Experimental Medicine and Biology. 2010; 680:559–64.
  • Culotta A. Detecting influenza outbreaks by analyzing twitter
  • messages. 2010 Jul 27.
  • Eijk MVD, Faber JM, Aarts WJ, Kremer AJ, Munneke M,
  • Bloem RB. Using online health communities to deliver
  • patient-centered care to people with chronic conditions. J
  • Med Internet Res. 2013; 15(6):e115.
  • Gharehchopogh FS, Seyyed SR, Maleki I. A new approach
  • in bloggers clasification with hybrid of k-nearest neighbor
  • and artificial neural network algorithm. Indian Journal of
  • Science and technology. 2015; 8(3):237–46.
  • Paul MJ, Dredze M. A model for mining public health topics
  • from twitter. Technical Report; 2011. p. 1–7.
  • Paul MJ, Dredze M. You are what you tweet: Analyzing
  • Twitter for public health. 5th International AAAI
  • Conference on Weblogs and Social Media; 2011. p. 265–68.
  • Tuarob S, Tucker CS, Salathe M, Ramd N. An ensemble
  • heterogeneous classification methodology for discovering
  • health-related knowledge in social media messages. Journal
  • of Biomedical Informatics. 2014; 49:255–68.
  • Joachims T. Text categorization with Support Vector
  • Machine: learning with many relevant features. Proceedings
  • of the 10th European Conference on Machine Learning
  • (ECM’98). 1998; 1398:137–42.
  • Yang CC, Yang H, Jiang L, Zhang M. Social media mining
  • for drug safety signal detection. Proceedings of the ACM
  • International Workshop on Smart Health and Wellbeing
  • (SHB’12). New York, NY, USA: ACM; 2012. p. 33–40.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.