• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: 46, Pages: 4410-4420

Original Article

Convolutional Neural Network-Based Automatic Speech Emotion Recognition System for Malayalam

Received Date:17 August 2023, Accepted Date:23 October 2023, Published Date:15 December 2023


Objectives: This research work focuses on developing a SER system using CNN and deep learning techniques for a low-resourced Dravidian Indian Language, Malayalam. The importance of speech as a powerful and natural medium of communication, capable of conveying a wide range of information about an individual's mental, behavioral, and emotional characteristics. With the increasing prevalence of human-machine interactions, the study of speech analysis has played a crucial role in bridging the gap between the physical and digital realms. Particularly, the field of emotion identification has gained popularity, as emotions are frequently expressed through speech cues. However, the scarcity of suitable datasets poses a challenge for researchers conducting experiments. Methods: In this paper, we address this challenge by employing Long Convolutional Neural Networks (CNN) to effectively recognize sentiments in voice recordings of Malayalam, a low-resource language. We manually construct datasets from audio clips of Malayalam movies and employ the Mel Frequency-Cepstral-Coefficient (MFCC) approach to extract features from the audio signals. Findings: By training, classifying, and testing our model using raw speech data from the dataset, the paper proposes a novel approach for recognizing emotions from voice signals processed in Malayalam with an average accuracy of 71%, indicating its ability to correctly predict emotions from vocal utterances in this under-resourced Language. Novelty: The novelty of this work lies in its dedication to addressing the challenges of emotion recognition in a low-resource language, the manual creation of datasets, and the successful adaptation of established techniques to a linguistic context where research is relatively scarce. These contributions collectively advance the field of speech emotion recognition and pave the way for further exploration in underrepresented languages.

Keywords: Speech emotion recognition, Malayalam, Natural Language Processing, MFCC, CNN


  1. Dutt A, Gader P. Wavelet Multiresolution Analysis Based Speech Emotion Recognition System Using 1D CNN LSTM Networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2023;31:2043–2054. Available from: https://ieeexplore.ieee.org/document/10128692
  2. Zisad SN, Hossain MS, Andersson K. Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network. In: International Conference on Brain Informatics, BI 2020, Lecture Notes in Computer Science . Springer, Cham. 12241:287–296.
  3. Mustaqeem, Kwon S. MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach. Expert Systems with Applications. 2021;167:114177. Available from: https://doi.org/10.1016/j.eswa.2020.114177
  4. Abdelhamid AA, El-Kenawy ESM, Alotaibi B, Amer GM, Abdelkader MY, Ibrahim A, et al. Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm. IEEE Access. 2022;10:49265–49284. Available from: https://ieeexplore.ieee.org/document/9770097
  5. Abdulmohsin HA, Wahab HBA, Hossen AMJA. A new proposed statistical feature extraction method in speech emotion recognition. Computers & Electrical Engineering. 2021;93:107172. Available from: https://doi.org/10.1016/j.compeleceng.2021.107172
  6. Issa D, Demirci MF, Yazici A. Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control. 2020;59:101894. Available from: https://doi.org/10.1016/j.bspc.2020.101894
  7. Zhao J, Mao X, Chen L. Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control. 2019;47:312–323. Available from: https://doi.org/10.1016/j.bspc.2018.08.035
  8. Khalil RA, Jones EG, Babar MI, Jan T, Zafar MH, Alhussain T. Speech Emotion Recognition Using Deep Learning Techniques: A Review. IEEE Access. 2019;7:117327–117345. Available from: https://ieeexplore.ieee.org/document/8805181
  9. Koduru A, Valiveti HB, Budati AK. Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology. 2020;23(1):45–55. Available from: https://doi.org/10.1007/s10772-020-09672-4
  10. Baliga S, Sapna HM, VYG, Patil CM, Arlene A. Kannada Speech Emotion Recognition Using Ensembling Techniques. IRE Journals . 2023;6(11):250–255. Available from: https://www.irejournals.com/formatedpaper/1704436.pdf
  11. Ullah R, Asif M, Shah WA, Anjam F, Ullah I, Khurshaid T, et al. Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer. Sensors. 2023;23(13):1–20. Available from: https://doi.org/10.3390/s23136212
  12. Shahin I, Alomari OA, Nassif AB, Afyouni I, Hashem IA, Elnagar A. An efficient feature selection method for arabic and english speech emotion recognition using Grey Wolf Optimizer. Applied Acoustics. 2023;205:109279. Available from: https://doi.org/10.1016/j.apacoust.2023.109279
  13. Alluhaidan AS, Saidani O, Jahangir R, Nauman MA, Neffati OS. Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network. Applied Sciences. 2023;13(8):1–15. Available from: https://doi.org/10.3390/app13084750
  14. Asghar A, Sohaib S, Iftikhar S, Shafi M, Fatima K. An Urdu speech corpus for emotion recognition. PeerJ Computer Science. 2022;8:1–22. Available from: https://doi.org/10.7717/peerj-cs.954
  15. Kerkeni L, Serrestou Y, Mbarki M, Raoof K, Mahjoub MA, Cleder C. Automatic Speech Emotion Recognition Using Machine Learning. In: Cano A., ed. Social Media and Machine Learning. IntechOpen. 2019.
  16. Langari S, Marvi H, Zahedi M. Efficient speech emotion recognition using modified feature extraction. Informatics in Medicine Unlocked. 2020;20:1–11. Available from: https://doi.org/10.1016/j.imu.2020.100424
  17. Pandey SK, Shekhawat HS, Prasanna SRM. Deep Learning Techniques for Speech Emotion Recognition: A Review. In: 2019 29th International Conference Radioelektronika (RADIOELEKTRONIKA). Pardubice, Czech Republic, 16-18 April 2019. IEEE. .
  18. Huang H, Hu Z, Wang W, Wu M. Multimodal Emotion Recognition Based on Ensemble Convolutional Neural Network. IEEE Access. 2019;8:3265–3271. Available from: https://ieeexplore.ieee.org/document/8941090
  19. Aouani H, Ayed YB. Speech Emotion Recognition with deep learning. Procedia Computer Science. 2020;176:251–260. Available from: https://doi.org/10.1016/j.procs.2020.08.027
  20. Wang J, Xue M, Culhane R, Diao E, Ding J, Tarokh V. Speech Emotion Recognition with Dual-Sequence LSTM Architecture. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona, Spain, 04-08 May 2020. IEEE. .


© 2023 Muneer et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)


Subscribe now for latest articles and news.