• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2022, Volume: 15, Issue: 27, Pages: 1364-1371

Original Article

Use of Bidirectional Long Short Term Memory in Spoken Word Detection with reference to the Assamese language

Received Date:22 March 2022, Accepted Date:06 June 2022, Published Date:18 July 2022

Abstract

Objectives : The proposed method is based on a unique technique of Deep learning for identifying spoken words with reference to Assamese language. Most of the DNN based algorithms have been successfully implemented in the field of image recognition, computer vision, natural language processing and medical picture analysis. Methods: The method used here is the Bidirectional Long Short Term Memory (BLSTM). BLSTM incorporates both past and future situations together. The speech database for this research work is hired from the repository of Indian Language Technology Proliferation and Development Center (ILTP-DC). This repository contains 32,335 utterances by 1000 numbers of male and female participants, which is comprised of 262 unique Assamese native words. The BLSTM based recognition model is using 10 out of the 262 unique words and the remaining words are used in construction or generation of synthesized sentences. The feature extraction module uses 39 feature coefficients, which are composed of MFCC, ΔMFCC and ΔΔMFCC coefficients. Findings: The Word Error Rate (WER) of the BLSTM based recognition model is 18.84% with an average accuracy of 98.12%, which sets one promising benchmark when compared to recent findings. Novelty: In this work an attempt has been made with a different approach to detect certain keywords of Assamese language by adopting deep learning methodology. The future objective of this proposed work is to improve the detection capability of this model by considering multiple DNN models together in a hybrid approach along with the inclusion of additional features.

Keywords: Bidirectional Long Short Term Memory; Deep Learning; Speech recognition; WER; MFCC

References

  1. Kalita D, Borbora K. Keyword Detection using Auto Associative Neural Network with Reference to Assamese Language. International Journal of Recent Technology and Engineering. 2019;8(3):3290–3294. Available from: https://doi.org/10.35940/ijrte.C5428.098319
  2. Nath D, Kalita SK. A study of Spoken Word Recognition using Unsupervised Learning with reference to Assamese Language. 2019 2nd International Conference on Innovations in Electronics, Signal Processing and Communication (IESC). 2019;p. 98–103. Available from: https://doi.org/10.1109/IESPC.2019.8902439
  3. Lin J, Yumei Y, Maosheng Z, Defeng C, Chao W, Tonghan W. A Multiscale Chaotic Feature Extraction Method for Speaker Recognition. Complexity. 2020;2020:1–9. Available from: https://doi.org/10.1155/2020/8810901
  4. Georgescu ALL, Pappalardo A, Cucu H, Blott M. Performance vs. hardware requirements in state-of-the-art automatic speech recognition. EURASIP Journal on Audio, Speech, and Music Processing. 2021;2021(1):28. Available from: https://doi.org/10.1186/s13636-021-00217-4
  5. Shashidhar R, Patilkulkarni S, Puneeth SB. Combining audio and visual speech recognition using LSTM and deep convolutional neural network. International Journal of Information Technology. 2022;p. 1–2. Available from: https://doi.org/10.1007/s41870-022-00907-y
  6. Mahalingam H, Rajakumar MP. Speech Recognition using Multiscale Scattering of Audio Signals and Long Short-Term Memory of Neural Networks”. International Journal of Advances in Computer Science and Cloud Computing (IJACSCC). 2019;7(2):12–16. Available from: http://iraj.doionline.org/dx/IJACSCC-IRAJ-DOIONLINE-16658
  7. Singh A, Kaur N, Kukreja V, Kadyan V, Kumar M. Computational intelligence in processing of speech acoustics: a survey. Complex & Intelligent Systems. 2022;8(3):2623–2661. Available from: https://doi.org/10.1007/s40747-022-00665-1
  8. Wiesner M, Raj D, Khudanpur S. Injecting Text and Cross-Lingual Supervision in Few-Shot Learning from Self-Supervised Models. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2022. Available from: arXiv preprint arXiv:2110.04863
  9. Tang R, Lin J. Deep Residual Learning for Small-Footprint Keyword Spotting. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2018;p. 5484–5488. Available from: http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=8450881
  10. Choi S, Seo S, Shin B, Byun H, Kersner M, Kim B, et al. Temporal Convolution for Real-Time Keyword Spotting on Mobile Devices. Interspeech. 2019. Available from: arXiv preprint arXiv:1904.03814, 2019
  11. Mittermaier S, Kurzinger L, Waschneck B, Rigoll G. Small-Footprint Keyword Spotting on Raw Audio Data with Sinc-Convolutions. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2020. Available from: arXiv preprint arXiv:1911.02086, 2019
  12. Mo T, Yu Y, Salameh M, Niu D&, Jui S. Neural Architecture Search for Keyword Spotting. Interspeech 2020. 1982. Available from: https://doi.org/10.21437/Interspeech.2020-3132
  13. Supriya K. Trigger Word Recognition using LSTM. International Journal of Engineering Research. 2020. Available from: https://doi.org/10.17577/IJERTV9IS060092
  14. Araya M, Alehegn M. Text to Speech Synthesizer for Tigrigna Linguistic using Concatenative Based approach with LSTM model. Indian Journal of Science and Technology. 2022;15(1):19–27. Available from: https://doi.org/10.17485/IJST/v15i1.1935
  15. Baroi, OL, Kabir MSA, Niaz A, Islam MJ, Rahimi MJ. Effects of Filter Numbers and Sampling Frequencies on the Performance of MFCC and PLP based Bangla Isolated Word Recognition System. International Journal of Image, Graphics and Signal Processing. 2019. Available from: https://doi.org/10.5815/ijigsp.2019.11.05
  16. Yu J, Ye N, Du X, Han L. Automated English Speech Recognition Using Dimensionality Reduction with Deep Learning Approach. Wireless Communications and Mobile Computing. 2022;2022:1–11. Available from: https://doi.org/10.1155/2022/3597347

Copyright

© 2022 Kalita et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.