• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2021, Volume: 14, Issue: 5, Pages: 457-472

Original Article

Weighted Mel frequency cepstral coefficient based feature extraction for automatic assessment of stuttered speech using Bi-directional LSTM

Received Date:24 December 2020, Accepted Date:30 January 2021, Published Date:16 February 2021

Abstract

Objective: To propose a system for automatic assessment of stuttered speech to help the Speech Language Pathologists during their treatment of a person who stutters. Methods: A novel technique is proposed for automatic assessment of stuttered speech, composed of feature extraction based on Weighted Mel Frequency Cepstral Coefficient and classification using Bi-directional Long-Short Term Memory neural network. It mainly focuses on detecting prolongation and syllable, word, and phrase repetition in stuttered events. Findings: This study has discussed and performed a comparative analysis of WMFCC feature extraction method with different extensions of widely used MFCC, namely, Delta, and Delta-Delta cepstrum. The comparison of speech parameterization techniques is carried out based on the effect of different frame lengths, percentage of window overlapping, and preemphasis filter alpha value. The experimental investigation elucidated that WMFCC outperforms the other feature extraction methods and provides an average recognition accuracy of 96.67%. 14-dimensional WMFCC achieves a low computational overhead compared to conventional 42-dimensional MFCC, including Delta and Delta-delta cepstrum. Application: The integration of Weighted MFCC based speech feature extraction and deep learning Bi-LSTM based classification techniques proposed in this study are more efficient for introducing an optimal model to automatically classify the stuttered events such as prolongation and repetition.

Keywords: Stuttering; MFCC; Delta MFCC; WMFCC; BiLSTM

References

  1. Silverman F. Stuttering and other fluency disorders. Waveland Press. 2004.
  2. Erickson S, Block S. The social and communication impact of stuttering on adolescents and their families. Journal of Fluency Disorders. 2013;38(4):311–324. Available from: https://dx.doi.org/10.1016/j.jfludis.2013.09.003
  3. Guitar B. Williams, Wilkins., eds. Stuttering : an integrated approach to its nature and treatment. Williams, Wilkins. 2014.
  4. Gupta S, Shukla RS, Shukla RK. Literature survey and review of techniques used for automatic assessment of Stuttered Speech. Int J Manag Technol Eng. 2019;9:229–240. Available from: http://ijamtes.org/VOL-9-ISSUE-10-2019
  5. Chee LS, Ai OC, Hariharan M, Yaacob S. MFCC based recognition of repetitions and prolongations in stuttered speech using k-NN and LDA. SCOReD2009 - Proc. IEEE Student Conf Res Dev. 2009;p. 146–149. Available from: https://doi.org/10.1109/SCORED.2009.5443210
  6. Chee LS, Ai OC, Hariharan M, Yaacob S. Automatic detection of prolongations and repetitions using LPCC. In: Int Conf Tech Postgraduates. 2009.
  7. Ai OC, Hariharan M, Yaacob S, Chee LS. Classification of speech dysfluencies with MFCC and LPCC features. Expert Systems with Applications. 2012;39(2):2157–2165. Available from: https://dx.doi.org/10.1016/j.eswa.2011.07.065
  8. Fook CY, Muthusamy H, Chee LS, Adom AHB, Yaacob SB. Comparison of speech parameterization techniques for the classification of speech disfluencies. Turkish journal of electrical engineering and computer sciences. 2013;21(1):1983–1994. Available from: https://dx.doi.org/10.3906/elk-1112-84
  9. Świetlicka I, Kuniszyk-Jóźkowiak W, Smołka E. Hierarchical ANN system for stuttering identification. Computer Speech & Language. 2013;27(1):228–242. Available from: https://dx.doi.org/10.1016/j.csl.2012.05.003
  10. Jabeen S, Ravikumar KM. Analysis of 0dB and 10dB babble noise on stuttered speech. Proc IEEE Int Conf Soft-Computing Netw Secur ICSNS. 2015. Available from: https://doi.org/10.1109/ICSNS.2015.7292422
  11. Savin PS, Ramteke PB, Koolagudi SG. Recognition of repetition and prolongation in stuttered speech using ANN. Smart Innovation, Systems and Technologies. 2016;p. 65–71. Available from: https://doi.org/10.1007/978-81-322-2538-6_8
  12. Ramteke PB, Koolagudi SG, Afroz F. Repetition detection in stuttered speech. In: Smart Innovation, Systems and Technologies. (pp. 611-617) Springer Science and Business Media Deutschland GmbH. 2016.
  13. Mahesha P, Vinod DS. Automatic segmentation and classification of dysfluencies in stuttering speech. ACM Int Conf Proceeding Ser. 2016. Available from: https://doi.org/10.1145/2905055.2905245
  14. Ghonem S, Abdou S, Esmael M, Ghamry N. Classification of Stuttering Events Using I-Vector. In: The Egyptian Journal of Language Engineering. (Vol. 4, pp. 11-19) Egypt J Lang Eng. Egypts Presidential Specialized Council for Education and Scientific Research. 2017.
  15. Girirajan S, Sangeetha R, Preethi T, Chinnappa A. Automatic Speech Recognition with Stuttering Speech Removal using Long Short-Term Memory (LSTM) Int J Recent Technol Eng. 2020;8(5):1677–1681. Available from: https://doi.org/10.35940/ijrte.E6230.018520
  16. Katyal A, Kaur A, Gill J. Automatic Speech Recognition: A Review. Int J Eng Adv Technol. 2014;3(2). Available from: https://www.ijeat.org/wp-content/uploads/papers/v3i3/C2568023314.pdf
  17. Arjun KN, Karthik S, DK, Chanda P, Tripathi S. Automatic Correction of Stutter in Disfluent Speech. Procedia Computer Science. 2020;171:1363–1370. Available from: https://dx.doi.org/10.1016/j.procs.2020.04.146
  18. Hariharan M, Chee LS, Ai OC, Yaacob S. Classification of Speech Dysfluencies Using LPC Based Parameterization Techniques. Journal of Medical Systems. 2012;36(3):1821–1830. Available from: https://dx.doi.org/10.1007/s10916-010-9641-6
  19. Howell P, Davis S, Bartrip J. The University College London Archive of Stuttered Speech (UCLASS) Journal of Speech, Language, and Hearing Research. 2009;52:556–569. Available from: https://dx.doi.org/10.1044/1092-4388(2009/07-0129)
  20. Rabiner LR, Juang BH. Fundamentals of speech recognition. USA. Prentice-Hall, Inc.. 1993.
  21. Bachu RG, Kopparthi S, Adapa B, Barkana BD. Voiced/unvoiced decision for speech signals based on zero-crossing rate and energy. Adv Tech Comput Sci Softw Eng. 2010;p. 279–282. Available from: https://doi.org/10.1007/978-90-481-3660-5-47

Copyright

© 2021 Gupta et al.This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.