Mel Frequency Cepstral Coefficients based Bacterial Foraging Optimization with DNN-RBF for Speaker Recognition

P S Subhashini Pedalanka; M SatyaSai Ram; Duggirala Sreenivasa Rao

doi:10.17485/IJST/v14i41.1858

Article

Mel Frequency Cepstral Coefficients based Bacterial Foraging Optimization with DNN-RBF for Speaker Recognition

VIEWS 1636
PDF 269

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v14i41.1858

Year: 2021, Volume: 14, Issue: 41, Pages: 3082-3092

Original Article

Mel Frequency Cepstral Coefficients based Bacterial Foraging Optimization with DNN-RBF for Speaker Recognition

P S Subhashini Pedalanka¹,^2*, M SatyaSai Ram³, Duggirala Sreenivasa Rao⁴

¹Associate Professor, Department of E.C.E, R.V.R & JC College of Engineering, Chowdavaram, Guntur, India
²Research scholar, Department of E.C.E, JNTUH, Hyderabad, India
³Professor, Department of E.C.E, R.V.R & JC College of Engineering, Chowdavaram, Guntur, India
⁴Professor, Department of E.C.E, JNTUH, Hyderabad, India

*Corresponding Author
Email: [email protected]

Received Date:04 October 2021, Accepted Date:18 November 2021, Published Date:04 December 2021

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: To improve the accuracy and to reduce the time complexity of the Speaker Recognition system using Mel-Frequency Cepstral Coefficients (MFCCs) and Bacterial Foraging optimization (BFO) with DNN –RBF. Method: The MFCCs of each speech sample are derived by pre-processing the audio speech signal. The features are optimized with BFO algorithm. Finally, the probability score for each speaker is generated to identify the speaker. Then the features are classified towards the target speaker using DNN-RBF. For the proposed MBFOB speaker recognition function, the TIMIT read corpus is used. It contains a total of 6300 phrases, 10 phrases each. Findings: the identity of user is validated in the fields of authentication and surveillance for recognition of speaker. By using the audio speech signal, features are extracted. This paper suggests an MBFOB solution based on Mel-frequency Cepstral Coefficients and DNN-RBF with BFO, for the identification of speakers. The speech utterance from the TIMIT data corpus is preprocessed to obtain MFCC feature vectors DNN-RBF is used for the purpose of classifying the speaker and the feature vectors in the output layers are optimized with Bacterial Foraging optimization. Finally, the scores for each speaker are calculated to identify the speaker. Different output metrics like EER, DCF, Cavg and accuracy are used to test the proposed speaker recognition technique. The execution time of this proposed method is found to be lesser than the other existing methods. The experimental findings are contrasted with other current methods and it shows the efficiency of our approach. Novelty: A novel MFCC-based Bacterial Foraging Optimization with Deep Neural Network-Radial Basis Function (DNN-RBF) for identification of exact speaker is proposed in this study.

Keywords: BFO; DNN; RBF; Speech processing; speaker recognition; MFCC extraction; deep neural network; and Bacterial foraging optimization; scoring

References

Borde P, Varpe A, Manza R, Yannawar P. Recognition of isolated words using Zernike and MFCC features for audio visual speech recognition. International Journal of Speech Technology. 2015;18(2):167–175. Available from: https://dx.doi.org/10.1007/s10772-014-9257-1
Chougule SV, Chavan MS. Robust Spectral Features for Automatic Speaker Recognition in Mismatch Condition. Procedia Computer Science. 2015;58:272–279. Available from: https://dx.doi.org/10.1016/j.procs.2015.08.021
Singer E, Reynolds DA. Domain Mismatch Compensation for Speaker Recognition Using a Library of Whiteners. IEEE Signal Processing Letters. 2015;22(11):2000–2003. Available from: https://dx.doi.org/10.1109/lsp.2015.2451591
Richardson F, Reynolds D, Dehak N. Deep Neural Network Approaches to Speaker and Language Recognition. IEEE Signal Processing Letters. 2015;22(10):1671–1675. Available from: https://dx.doi.org/10.1109/lsp.2015.2420092
Stafylakis T, Alam MJ, Kenny P. Text-Dependent Speaker Recognition With Random Digit Strings. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2016;24(7):1194–1203. Available from: https://dx.doi.org/10.1109/taslp.2016.2546458
Visalakshi R, Dhanalakshmi P, Palanivel S. Analysis of Throat Microphone Using MFCC Features for Speaker Recognition. In: Computational Intelligence, Cyber Security and Computational Models. (pp. 35-41) Springer Singapore. 2016.
Kim C, Stern RM. Power-Normalized Cepstral Coefficients (PNCC) for robust speech recognition. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2012;24(7):1315–1329. Available from: https://doi.org/10.1109/TASLP.2016.2545928
Mannepalli K, Sastry PN, Suman M. MFCC-GMM based accent recognition system for Telugu speech signals. International Journal of Speech Technology. 2016;19(1):87–93. Available from: https://dx.doi.org/10.1007/s10772-015-9328-y
Jia F, Lei Y, Lin J, Zhou X, Lu N. Deep neural networks: A promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data. Mechanical Systems and Signal Processing. 2016;72-73:303–315. Available from: https://dx.doi.org/10.1016/j.ymssp.2015.10.025
Zeinali H, Sameti H, Burget L. Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models. Computer Speech & Language. 2017;46:53–71. Available from: https://dx.doi.org/10.1016/j.csl.2017.04.005
Wang JC, Wang CY, Chin YH, Liu YT, Chen ET, Chang PC. Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition. Multimedia Tools and Applications. 2017;76(3):4055–4068. Available from: https://dx.doi.org/10.1007/s11042-016-3335-0
Vincent E, Watanabe S, Nugraha AA, Barker J, Marxer R. An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Computer Speech & Language. 2017;46:535–557. Available from: https://dx.doi.org/10.1016/j.csl.2016.11.005
Ghahabi O, Hernando J. Deep Learning Backend for Single and Multisession i-Vector Speaker Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2017;25(4):807–817. Available from: https://dx.doi.org/10.1109/taslp.2017.2661705
Liu Z, Wu Z, Li T, Li J, Shen C. GMM and CNN Hybrid Method for Short Utterance Speaker Recognition. IEEE Transactions on Industrial Informatics. 2018;14(7):3244–3252. Available from: https://dx.doi.org/10.1109/tii.2018.2799928
Cai Z, Gu J, Wen C, Zhao D, Huang C, Huang H, et al. An Intelligent Parkinson’s Disease Diagnostic System Based on a Chaotic Bacterial Foraging Optimization Enhanced Fuzzy KNN Approach. Computational and Mathematical Methods in Medicine. 2018;2018:1–24. Available from: https://dx.doi.org/10.1155/2018/2396952
Zhang C, Koishida K, Hansen JHL. Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2018;26(9):1633–1644. Available from: https://dx.doi.org/10.1109/taslp.2018.2831456
Subhashin PSP, Ram MSS, Rao DS. Bacterial Foraging Optimized Parameters for ANN using Adaptive Harris Hawks Weight Optimization. 2021.

Copyright

© 2021 Subhashini Pedalanka et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)