Convolutional Neural Network-Based Automatic Speech Emotion Recognition System for Malayalam

V K Muneer; K P Mohamed Basheer; Rizwana Kallooravi Thandil

doi:10.17485/IJST/v16i46.2090

Article

Convolutional Neural Network-Based Automatic Speech Emotion Recognition System for Malayalam

VIEWS 459
PDF 112

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v16i46.2090

Year: 2023, Volume: 16, Issue: 46, Pages: 4410-4420

Original Article

Convolutional Neural Network-Based Automatic Speech Emotion Recognition System for Malayalam

V K Muneer^1*, K P Mohamed Basheer¹, Rizwana Kallooravi Thandil¹

¹Department of Computer Science, Sullamussalam Science College, Affiliated to University of Calicut, Kerala, India

*Corresponding Author
Email: [email protected]

Received Date:17 August 2023, Accepted Date:23 October 2023, Published Date:15 December 2023

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: This research work focuses on developing a SER system using CNN and deep learning techniques for a low-resourced Dravidian Indian Language, Malayalam. The importance of speech as a powerful and natural medium of communication, capable of conveying a wide range of information about an individual's mental, behavioral, and emotional characteristics. With the increasing prevalence of human-machine interactions, the study of speech analysis has played a crucial role in bridging the gap between the physical and digital realms. Particularly, the field of emotion identification has gained popularity, as emotions are frequently expressed through speech cues. However, the scarcity of suitable datasets poses a challenge for researchers conducting experiments. Methods: In this paper, we address this challenge by employing Long Convolutional Neural Networks (CNN) to effectively recognize sentiments in voice recordings of Malayalam, a low-resource language. We manually construct datasets from audio clips of Malayalam movies and employ the Mel Frequency-Cepstral-Coefficient (MFCC) approach to extract features from the audio signals. Findings: By training, classifying, and testing our model using raw speech data from the dataset, the paper proposes a novel approach for recognizing emotions from voice signals processed in Malayalam with an average accuracy of 71%, indicating its ability to correctly predict emotions from vocal utterances in this under-resourced Language. Novelty: The novelty of this work lies in its dedication to addressing the challenges of emotion recognition in a low-resource language, the manual creation of datasets, and the successful adaptation of established techniques to a linguistic context where research is relatively scarce. These contributions collectively advance the field of speech emotion recognition and pave the way for further exploration in underrepresented languages.

Keywords: Speech emotion recognition, Malayalam, Natural Language Processing, MFCC, CNN

References

Dutt A, Gader P. Wavelet Multiresolution Analysis Based Speech Emotion Recognition System Using 1D CNN LSTM Networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2023;31:2043–2054. Available from: https://ieeexplore.ieee.org/document/10128692
Zisad SN, Hossain MS, Andersson K. Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network. In: International Conference on Brain Informatics, BI 2020, Lecture Notes in Computer Science . Springer, Cham. 12241:287–296.
Mustaqeem, Kwon S. MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach. Expert Systems with Applications. 2021;167:114177. Available from: https://doi.org/10.1016/j.eswa.2020.114177
Abdelhamid AA, El-Kenawy ESM, Alotaibi B, Amer GM, Abdelkader MY, Ibrahim A, et al. Robust Speech Emotion Recognition Using CNN+LSTM Based on Stochastic Fractal Search Optimization Algorithm. IEEE Access. 2022;10:49265–49284. Available from: https://ieeexplore.ieee.org/document/9770097
Lee KH, Kim DH. Design of a Convolutional Neural Network for Speech Emotion Recognition. In: 2020 International Conference on Information and Communication Technology Convergence (ICTC). Jeju, Korea (South), 21-23 October 2020. IEEE. .
Abdulmohsin HA, Wahab HBA, Hossen AMJA. A new proposed statistical feature extraction method in speech emotion recognition. Computers & Electrical Engineering. 2021;93:107172. Available from: https://doi.org/10.1016/j.compeleceng.2021.107172
Issa D, Demirci MF, Yazici A. Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control. 2020;59:101894. Available from: https://doi.org/10.1016/j.bspc.2020.101894
Zhao J, Mao X, Chen L. Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control. 2019;47:312–323. Available from: https://doi.org/10.1016/j.bspc.2018.08.035
Khalil RA, Jones EG, Babar MI, Jan T, Zafar MH, Alhussain T. Speech Emotion Recognition Using Deep Learning Techniques: A Review. IEEE Access. 2019;7:117327–117345. Available from: https://ieeexplore.ieee.org/document/8805181
Koduru A, Valiveti HB, Budati AK. Feature extraction algorithms to improve the speech emotion recognition rate. International Journal of Speech Technology. 2020;23(1):45–55. Available from: https://doi.org/10.1007/s10772-020-09672-4
Atmaja BT, Akagi M. Speech Emotion Recognition Based on Speech Segment Using LSTM with Attention Model. In: 2019 IEEE International Conference on Signals and Systems (ICSigSys). Bandung, Indonesia, 16-18 July 2019. IEEE. .
Baliga S, Sapna HM, VYG, Patil CM, Arlene A. Kannada Speech Emotion Recognition Using Ensembling Techniques. IRE Journals . 2023;6(11):250–255. Available from: https://www.irejournals.com/formatedpaper/1704436.pdf
Mustaqeem, Sajjad M, Kwon S. Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM. IEEE Access. 2020;8:79861–79875. Available from: https://ieeexplore.ieee.org/document/9078789
Ullah R, Asif M, Shah WA, Anjam F, Ullah I, Khurshaid T, et al. Speech Emotion Recognition Using Convolution Neural Networks and Multi-Head Convolutional Transformer. Sensors. 2023;23(13):1–20. Available from: https://doi.org/10.3390/s23136212
Shahin I, Alomari OA, Nassif AB, Afyouni I, Hashem IA, Elnagar A. An efficient feature selection method for arabic and english speech emotion recognition using Grey Wolf Optimizer. Applied Acoustics. 2023;205:109279. Available from: https://doi.org/10.1016/j.apacoust.2023.109279
Alluhaidan AS, Saidani O, Jahangir R, Nauman MA, Neffati OS. Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network. Applied Sciences. 2023;13(8):1–15. Available from: https://doi.org/10.3390/app13084750
Asghar A, Sohaib S, Iftikhar S, Shafi M, Fatima K. An Urdu speech corpus for emotion recognition. PeerJ Computer Science. 2022;8:1–22. Available from: https://doi.org/10.7717/peerj-cs.954
Qayyum ABA, Arefeen A, Shahnaz C. Convolutional Neural Network (CNN) Based Speech-Emotion Recognition. In: 2019 IEEE International Conference on Signal Processing, Information, Communication & Systems (SPICSCON). Dhaka, Bangladesh, 28-30 November 2019. IEEE. .
Kerkeni L, Serrestou Y, Mbarki M, Raoof K, Mahjoub MA, Cleder C. Automatic Speech Emotion Recognition Using Machine Learning. In: Cano A., ed. Social Media and Machine Learning. IntechOpen. 2019.
Langari S, Marvi H, Zahedi M. Efficient speech emotion recognition using modified feature extraction. Informatics in Medicine Unlocked. 2020;20:1–11. Available from: https://doi.org/10.1016/j.imu.2020.100424
Pandey SK, Shekhawat HS, Prasanna SRM. Deep Learning Techniques for Speech Emotion Recognition: A Review. In: 2019 29th International Conference Radioelektronika (RADIOELEKTRONIKA). Pardubice, Czech Republic, 16-18 April 2019. IEEE. .
Huang H, Hu Z, Wang W, Wu M. Multimodal Emotion Recognition Based on Ensemble Convolutional Neural Network. IEEE Access. 2019;8:3265–3271. Available from: https://ieeexplore.ieee.org/document/8941090
Aouani H, Ayed YB. Speech Emotion Recognition with deep learning. Procedia Computer Science. 2020;176:251–260. Available from: https://doi.org/10.1016/j.procs.2020.08.027
Wang J, Xue M, Culhane R, Diao E, Ding J, Tarokh V. Speech Emotion Recognition with Dual-Sequence LSTM Architecture. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona, Spain, 04-08 May 2020. IEEE. .
Akçay MB, Oğuz K. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication. 2020;116:56–76. Available from: https://doi.org/10.1016/j.specom.2019.12.001

Copyright

© 2023 Muneer et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)