• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2021, Volume: 14, Issue: 41, Pages: 3082-3092

Original Article

Mel Frequency Cepstral Coefficients based Bacterial Foraging Optimization with DNN-RBF for Speaker Recognition

Received Date:04 October 2021, Accepted Date:18 November 2021, Published Date:04 December 2021

Abstract

Objectives: To improve the accuracy and to reduce the time complexity of the Speaker Recognition system using Mel-Frequency Cepstral Coefficients (MFCCs) and Bacterial Foraging optimization (BFO) with DNN –RBF. Method: The MFCCs of each speech sample are derived by pre-processing the audio speech signal. The features are optimized with BFO algorithm. Finally, the probability score for each speaker is generated to identify the speaker. Then the features are classified towards the target speaker using DNN-RBF. For the proposed MBFOB speaker recognition function, the TIMIT read corpus is used. It contains a total of 6300 phrases, 10 phrases each. Findings: the identity of user is validated in the fields of authentication and surveillance for recognition of speaker. By using the audio speech signal, features are extracted. This paper suggests an MBFOB solution based on Mel-frequency Cepstral Coefficients and DNN-RBF with BFO, for the identification of speakers. The speech utterance from the TIMIT data corpus is preprocessed to obtain MFCC feature vectors DNN-RBF is used for the purpose of classifying the speaker and the feature vectors in the output layers are optimized with Bacterial Foraging optimization. Finally, the scores for each speaker are calculated to identify the speaker. Different output metrics like EER, DCF, Cavg and accuracy are used to test the proposed speaker recognition technique. The execution time of this proposed method is found to be lesser than the other existing methods. The experimental findings are contrasted with other current methods and it shows the efficiency of our approach. Novelty: A novel MFCC-based Bacterial Foraging Optimization with Deep Neural Network-Radial Basis Function (DNN-RBF) for identification of exact speaker is proposed in this study.

Keywords: BFO; DNN; RBF; Speech processing; speaker recognition; MFCC extraction; deep neural network; and Bacterial foraging optimization; scoring

References

  1. Borde P, Varpe A, Manza R, Yannawar P. Recognition of isolated words using Zernike and MFCC features for audio visual speech recognition. International Journal of Speech Technology. 2015;18(2):167–175. Available from: https://dx.doi.org/10.1007/s10772-014-9257-1
  2. Chougule SV, Chavan MS. Robust Spectral Features for Automatic Speaker Recognition in Mismatch Condition. Procedia Computer Science. 2015;58:272–279. Available from: https://dx.doi.org/10.1016/j.procs.2015.08.021
  3. Singer E, Reynolds DA. Domain Mismatch Compensation for Speaker Recognition Using a Library of Whiteners. IEEE Signal Processing Letters. 2015;22(11):2000–2003. Available from: https://dx.doi.org/10.1109/lsp.2015.2451591
  4. Richardson F, Reynolds D, Dehak N. Deep Neural Network Approaches to Speaker and Language Recognition. IEEE Signal Processing Letters. 2015;22(10):1671–1675. Available from: https://dx.doi.org/10.1109/lsp.2015.2420092
  5. Stafylakis T, Alam MJ, Kenny P. Text-Dependent Speaker Recognition With Random Digit Strings. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2016;24(7):1194–1203. Available from: https://dx.doi.org/10.1109/taslp.2016.2546458
  6. Visalakshi R, Dhanalakshmi P, Palanivel S. Analysis of Throat Microphone Using MFCC Features for Speaker Recognition. In: Computational Intelligence, Cyber Security and Computational Models. (pp. 35-41) Springer Singapore. 2016.
  7. Kim C, Stern RM. Power-Normalized Cepstral Coefficients (PNCC) for robust speech recognition. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2012;24(7):1315–1329. Available from: https://doi.org/10.1109/TASLP.2016.2545928
  8. Mannepalli K, Sastry PN, Suman M. MFCC-GMM based accent recognition system for Telugu speech signals. International Journal of Speech Technology. 2016;19(1):87–93. Available from: https://dx.doi.org/10.1007/s10772-015-9328-y
  9. Jia F, Lei Y, Lin J, Zhou X, Lu N. Deep neural networks: A promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data. Mechanical Systems and Signal Processing. 2016;72-73:303–315. Available from: https://dx.doi.org/10.1016/j.ymssp.2015.10.025
  10. Zeinali H, Sameti H, Burget L. Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models. Computer Speech & Language. 2017;46:53–71. Available from: https://dx.doi.org/10.1016/j.csl.2017.04.005
  11. Wang JC, Wang CY, Chin YH, Liu YT, Chen ET, Chang PC. Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition. Multimedia Tools and Applications. 2017;76(3):4055–4068. Available from: https://dx.doi.org/10.1007/s11042-016-3335-0
  12. Vincent E, Watanabe S, Nugraha AA, Barker J, Marxer R. An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Computer Speech & Language. 2017;46:535–557. Available from: https://dx.doi.org/10.1016/j.csl.2016.11.005
  13. Ghahabi O, Hernando J. Deep Learning Backend for Single and Multisession i-Vector Speaker Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2017;25(4):807–817. Available from: https://dx.doi.org/10.1109/taslp.2017.2661705
  14. Liu Z, Wu Z, Li T, Li J, Shen C. GMM and CNN Hybrid Method for Short Utterance Speaker Recognition. IEEE Transactions on Industrial Informatics. 2018;14(7):3244–3252. Available from: https://dx.doi.org/10.1109/tii.2018.2799928
  15. Cai Z, Gu J, Wen C, Zhao D, Huang C, Huang H, et al. An Intelligent Parkinson’s Disease Diagnostic System Based on a Chaotic Bacterial Foraging Optimization Enhanced Fuzzy KNN Approach. Computational and Mathematical Methods in Medicine. 2018;2018:1–24. Available from: https://dx.doi.org/10.1155/2018/2396952
  16. Zhang C, Koishida K, Hansen JHL. Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2018;26(9):1633–1644. Available from: https://dx.doi.org/10.1109/taslp.2018.2831456

Copyright

© 2021 Subhashini Pedalanka et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.