Weighted Mel frequency cepstral coefficient based feature extraction for automatic assessment of stuttered speech using Bi-directional LSTM

Sakshi Gupta; Ravi S Shukla; Rajesh K Shukla

doi:10.17485/IJST/v14i5.2276

Article

Weighted Mel frequency cepstral coefficient based feature extraction for automatic assessment of stuttered speech using Bi-directional LSTM

VIEWS 2092
PDF 1511

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v14i5.2276

Year: 2021, Volume: 14, Issue: 5, Pages: 457-472

Original Article

Weighted Mel frequency cepstral coefficient based feature extraction for automatic assessment of stuttered speech using Bi-directional LSTM

Sakshi Gupta^1*, Ravi S Shukla², Rajesh K Shukla³

¹Department of Computer Science and Engineering, Invertis University, Bareilly, Uttar Pradesh, India
²Department of Computer Science, Saudi Electronic University, Tabuk, Saudi Arabia
³Department of Engineering and Technology, Invertis University, Bareilly, Uttar Pradesh, India

*Corresponding Author
Email: [email protected]

Received Date:24 December 2020, Accepted Date:30 January 2021, Published Date:16 February 2021

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objective: To propose a system for automatic assessment of stuttered speech to help the Speech Language Pathologists during their treatment of a person who stutters. Methods: A novel technique is proposed for automatic assessment of stuttered speech, composed of feature extraction based on Weighted Mel Frequency Cepstral Coefficient and classification using Bi-directional Long-Short Term Memory neural network. It mainly focuses on detecting prolongation and syllable, word, and phrase repetition in stuttered events. Findings: This study has discussed and performed a comparative analysis of WMFCC feature extraction method with different extensions of widely used MFCC, namely, Delta, and Delta-Delta cepstrum. The comparison of speech parameterization techniques is carried out based on the effect of different frame lengths, percentage of window overlapping, and preemphasis filter alpha value. The experimental investigation elucidated that WMFCC outperforms the other feature extraction methods and provides an average recognition accuracy of 96.67%. 14-dimensional WMFCC achieves a low computational overhead compared to conventional 42-dimensional MFCC, including Delta and Delta-delta cepstrum. Application: The integration of Weighted MFCC based speech feature extraction and deep learning Bi-LSTM based classification techniques proposed in this study are more efficient for introducing an optimal model to automatically classify the stuttered events such as prolongation and repetition.

Keywords: Stuttering; MFCC; Delta MFCC; WMFCC; BiLSTM

References

Silverman F. Stuttering and other fluency disorders. Waveland Press. 2004.
Erickson S, Block S. The social and communication impact of stuttering on adolescents and their families. Journal of Fluency Disorders. 2013;38(4):311–324. Available from: https://dx.doi.org/10.1016/j.jfludis.2013.09.003
Guitar B. Williams, Wilkins., eds. Stuttering : an integrated approach to its nature and treatment. Williams, Wilkins. 2014.
Gupta S, Shukla RS, Shukla RK. Literature survey and review of techniques used for automatic assessment of Stuttered Speech. Int J Manag Technol Eng. 2019;9:229–240. Available from: http://ijamtes.org/VOL-9-ISSUE-10-2019
Ravikumar KM, Rajagopal R, Nagaraj HC. An Approach for Objective Assessment of Stuttered Speech Using MFCC Features. 2009. Available from: http://itie.in/Ravi_Paper_itie_ICGST.pdf
Thiang W. Speech Recognition Using LPC and HMM Applied for Controlling Movement of Mobile Robot. . In: Semin Nas Teknol Inf. .
Chee LS, Ai OC, Hariharan M, Yaacob S. MFCC based recognition of repetitions and prolongations in stuttered speech using k-NN and LDA. SCOReD2009 - Proc. IEEE Student Conf Res Dev. 2009;p. 146–149. Available from: https://doi.org/10.1109/SCORED.2009.5443210
Chee LS, Ai OC, Hariharan M, Yaacob S. Automatic detection of prolongations and repetitions using LPCC. In: Int Conf Tech Postgraduates. 2009.
Kumar KMR, Ganesan S. Comparison of Multidimensional MFCC Feature Vectors for Objective Assessment of Stuttered Disfluencies. Int J Adv Netw Appl. 2011;860:854–860. Available from: http://www.ijana.in/papers/v2i5-9.pdf
Ai OC, Hariharan M, Yaacob S, Chee LS. Classification of speech dysfluencies with MFCC and LPCC features. Expert Systems with Applications. 2012;39(2):2157–2165. Available from: https://dx.doi.org/10.1016/j.eswa.2011.07.065
Hariharan M, Vijean V, Fook CY, Yaacob S. Speech stuttering assessment using sample entropy and Least Square Support Vector Machine. In: Proc - 2012 IEEE 8th Int Colloq Signal Process Its Appl CSPA. (pp. 240-245) 2012.
Fook CY, Muthusamy H, Chee LS, Adom AHB, Yaacob SB. Comparison of speech parameterization techniques for the classification of speech disfluencies. Turkish journal of electrical engineering and computer sciences. 2013;21(1):1983–1994. Available from: https://dx.doi.org/10.3906/elk-1112-84
Świetlicka I, Kuniszyk-Jóźkowiak W, Smołka E. Hierarchical ANN system for stuttering identification. Computer Speech & Language. 2013;27(1):228–242. Available from: https://dx.doi.org/10.1016/j.csl.2012.05.003
Pálfy J. Analysis of Dysfluencies by Computational Intelligence. In: In-formation Sci Technol Bull ACM Slovakia. (Vol. 6, pp. 45-58) 2014.
Jabeen S, Ravikumar KM. Analysis of 0dB and 10dB babble noise on stuttered speech. Proc IEEE Int Conf Soft-Computing Netw Secur ICSNS. 2015. Available from: https://doi.org/10.1109/ICSNS.2015.7292422
Savin PS, Ramteke PB, Koolagudi SG. Recognition of repetition and prolongation in stuttered speech using ANN. Smart Innovation, Systems and Technologies. 2016;p. 65–71. Available from: https://doi.org/10.1007/978-81-322-2538-6_8
Ramteke PB, Koolagudi SG, Afroz F. Repetition detection in stuttered speech. In: Smart Innovation, Systems and Technologies. (pp. 611-617) Springer Science and Business Media Deutschland GmbH. 2016.
Mahesha P, Vinod DS. Automatic segmentation and classification of dysfluencies in stuttering speech. ACM Int Conf Proceeding Ser. 2016. Available from: https://doi.org/10.1145/2905055.2905245
Esmaili I, Dabanloo NJ, Vali M. Automatic classification of speech dysfluencies in continuous speech based on similarity measures and morphological image processing tools. Biomedical Signal Processing and Control. 2016;23:104–114. Available from: https://dx.doi.org/10.1016/j.bspc.2015.08.006
Ghonem S, Abdou S, Esmael M, Ghamry N. Classification of Stuttering Events Using I-Vector. In: The Egyptian Journal of Language Engineering. (Vol. 4, pp. 11-19) Egypt J Lang Eng. Egypts Presidential Specialized Council for Education and Scientific Research. 2017.
Bhatia G, Saha B, Khamkar M, Chandwani A, Khot R. Stutter Diagnosis and Therapy System Based on Deep Learning. 2020. Available from: https://www.researchgate.net/publication/343005525_Stutter_Diagnosis_and_Therapy_System_Based_on_Deep_Learning
Girirajan S, Sangeetha R, Preethi T, Chinnappa A. Automatic Speech Recognition with Stuttering Speech Removal using Long Short-Term Memory (LSTM) Int J Recent Technol Eng. 2020;8(5):1677–1681. Available from: https://doi.org/10.35940/ijrte.E6230.018520
Katyal A, Kaur A, Gill J. Automatic Speech Recognition: A Review. Int J Eng Adv Technol. 2014;3(2). Available from: https://www.ijeat.org/wp-content/uploads/papers/v3i3/C2568023314.pdf
Arjun KN, Karthik S, DK, Chanda P, Tripathi S. Automatic Correction of Stutter in Disfluent Speech. Procedia Computer Science. 2020;171:1363–1370. Available from: https://dx.doi.org/10.1016/j.procs.2020.04.146
Hariharan M, Chee LS, Ai OC, Yaacob S. Classification of Speech Dysfluencies Using LPC Based Parameterization Techniques. Journal of Medical Systems. 2012;36(3):1821–1830. Available from: https://dx.doi.org/10.1007/s10916-010-9641-6
Gupta S, Shukla RS, Shukla RK, Verma R. Deep Learning Bidirectional LSTM based Detection of Prolongation and Repetition in Stuttered Speech using Weighted MFCC. Int J Adv Comput Sci Appl. 2020;11(9). Available from: https://doi.org/10.14569/IJACSA.2020.0110941
Howell P, Davis S, Bartrip J. The University College London Archive of Stuttered Speech (UCLASS) Journal of Speech, Language, and Hearing Research. 2009;52:556–569. Available from: https://dx.doi.org/10.1044/1092-4388(2009/07-0129)
Rabiner LR, Juang BH. Fundamentals of speech recognition. USA. Prentice-Hall, Inc.. 1993.
Bachu RG, Kopparthi S, Adapa B, Barkana BD. Voiced/unvoiced decision for speech signals based on zero-crossing rate and energy. Adv Tech Comput Sci Softw Eng. 2010;p. 279–282. Available from: https://doi.org/10.1007/978-90-481-3660-5-47
Huang X. Spoken language processing : a guide to theory, algorithm and system development. Prentice Hall PTR. 2001.
Chapaneri VS. Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping. Int J Comput Appl. 2012;40(3):6–12. Available from: https://doi.org/10.5120/5022-7167

Copyright

© 2021 Gupta et al.This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)