Development of Small Vocabulary Continuous Speech-to-Text System for Kannada Language/Dialects

G Thimmaraja Yadava; B G Nagaraja; S Yogesh Kumaran; A C Ramachandra; N M Arun Kumar

doi:10.17485/IJST/v15i45.1884

Article

Development of Small Vocabulary Continuous Speech-to-Text System for Kannada Language/Dialects

VIEWS 642
PDF 137

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v15i45.1884

Year: 2022, Volume: 15, Issue: 45, Pages: 2476-2481

Original Article

Development of Small Vocabulary Continuous Speech-to-Text System for Kannada Language/Dialects

G Thimmaraja Yadava^1*, B G Nagaraja², S Yogesh Kumaran³, A C Ramachandra¹, N M Arun Kumar¹

¹Nitte Meenakshi Institute of Technology, Bengaluru, Karnataka, India
²Vidyavardhaka College of Engineering, Mysuru, Karnataka, India
³Faculty of Engineering and Technology, Jain deemed-to-be University, Kanakapura, Karnataka, India

*Corresponding Author
Email: [email protected]

Received Date:19 September 2022, Accepted Date:30 October 2022, Published Date:07 December 2022

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: To develop a speech-to-text (STT) system using Kaldi speech recognition toolkit for continuous Kannada language/dialects. Methods: A continuous Kannada speech data is collected from 100 speakers/farmers of Karnataka state in field. The lexicon/dictionary and set of phonemes for Kannada language/dialects are created and transcribed the collected speech data using transcriber tool. The ASR models are developed at different phoneme levels using Kaldi. Findings: In this work, an effort is made to develop a robust small vocabulary STT system for continuous Kannada language using Kaldi. The various acoustic modelling techniques are used to develop a robust ASR model and achieved a word error rate (WER) of 0.23%. The performance of the developed ASR model is compared with existing works and analyzed by offline speech recognition. Novelty: Many STT systems have been developed for Indian and International languages/dialects, but not for Kannada language. This work is first of its kind using Kaldi in Kannada language under the constraints of limited data. The developed ASR model could be used further in the development of end-to-end ASR system for speech processing applications.

Keywords: Automatic Speech Recognition (ASR); Word Error Rate (WER); Continuous Kannada Speech Data; Kannada Language/Dialects; Lexicon

References

Rabiner LR. Applications of voice processing to telecommunications. Proceedings of the IEEE. 1994;82(2):199–228. Available from: https://doi:10.1109/5.265347
Zhao Y. A speaker-independent continuous speech recognition system using continuous mixture Gaussian density HMM of phoneme-sized units. IEEE Transactions on Speech and Audio Processing. 1993;1(3):345–361. Available from: https://doi:10.1109/89.232618
Wachter MD, Matton M, Demuynck K, Wambacq P, Cools R, Compernolle DV. Template-Based Continuous Speech Recognition. IEEE Transactions on Audio, Speech and Language Processing. 2007;15(4):1377–1390. Available from: https://doi:10.1109/TASL.2007.894524
Triefenbach F, Demuynck K, Martens JPP. Large Vocabulary Continuous Speech Recognition With Reservoir-Based Acoustic Models. IEEE Signal Processing Letters. 2014;21(3):311–315. Available from: https://doi:10.1109/LSP.2014.2302080
Su R, Liu X, Wang L. Automatic Complexity Control of Generalized Variable Parameter HMMs for Noise Robust Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2015;23(1):1. Available from: https://doi:10.1109/TASLP.2014.2372901
Dimitriadis D, Bocchieri E. Use of Micro-Modulation Features in Large Vocabulary Continuous Speech Recognition Tasks. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2015;23(8):1348–1357. Available from: https://doi:10.1109/TASLP.2015.2430815
Ganapathy S. Multivariate Autoregressive Spectrogram Modeling for Noisy Speech Recognition. IEEE Signal Processing Letters. 2017;24(9):1373–1377. Available from: https://doi:10.1109/LSP.2017.2724561
Afouras T, Chung JS, Senior A, Vinyals O, Zisserman A. Deep Audio-Visual Speech Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022;44(12):8717–8727. Available from: https://doi:10.1109/TPAMI.2018.2889052
Deng L, Li X. Machine Learning Paradigms for Speech Recognition: An Overview. IEEE Transactions on Audio, Speech, and Language Processing. 2013;21(5):1060–1089. Available from: https://doi:10.1109/TASL.2013.2244083
Furui S, Kikuchi T, Shinnaka Y, Hori C. Speech-to-Text and Speech-to-Speech Summarization of Spontaneous Speech. IEEE Transactions on Speech and Audio Processing. 2004;12(4):401–408. Available from: https://doi:10.1109/TSA.2004.828699
Yadava TG, Jayanna HS. A spoken query system for the agricultural commodity prices and weather information access in Kannada language. International Journal of Speech Technology. 2017;20(3):635–644. Available from: https://doi.org/10.1007/s10772-017-9428-y
Yadava TG, Jayanna HS. Speech enhancement by combining spectral subtraction and minimum mean square error-spectrum power estimator based on zero crossing. International Journal of Speech Technology. 2019;22(3):639–648. Available from: https://doi.org/10.1007/s10772-018-9506-9
Yadava TG, Jayanna HS. Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling. International Journal of Speech Technology. 2020;23(1):149–167. Available from: https://doi.org/10.1007/s10772-020-09671-5
Yadava TG, Jayanna HS. Improvements in Spoken Query System to Access the Agricultural Commodity Prices and Weather Information in Kannada Language/Dialects. Journal of Intelligent Systems. 2018;29(1):664–687. Available from: https://doi.org/10.1515/jisys-2018-0120
Kumar P, Yadava PS, Jayanna TG, HS. Continuous Kannada speech recognition system under degraded condition. Circuits, Systems, and Signal Processing. 2020;39:391–419. Available from: https://doi.org/10.1007/s00034-019-01189-9
Yadava GT, Nagaraja BG, Jayanna HS. Enhancements in Continuous Kannada ASR System by Background Noise Elimination. Circuits, Systems, and Signal Processing. 2022;41(7):4041–4067. Available from: https://doi.org/10.1007/s00034-022-01973-0
Yadava TG, Nagaraja BG, Jayanna HS. A spatial procedure to spectral subtraction for speech enhancement. Multimedia tools and applications. 2022;81:23633–23647. Available from: https://doi.org/10.1007/s11042-022-12152-3
Louis J, Fendji KE, Tala DCM, Blaise O, Marcellin AY&. Automatic Speech Recognition Using Limited Vocabulary: A Survey. Applied Artificial Intelligence. 2022;36(1). Available from: https://doi.org/10.1080/08839514.2022.2095039
Thalengala A, Hoblidar A, Girisha S, Tumkur. Effect of time-domain windowing on isolated speech recognition system performance. International Journal of Electronics and Telecommunications. 2022(1):161–166. Available from: https://doi.org/10.24425/ijet.2022.139856
Kumar P, Jayanna HS. Development of Speaker-Independent Automatic Speech Recognition System for Kannada Language. Indian Journal of Science and Technology. 2022;15(8):333–342. Available from: https://doi.org/10.17485/IJST/v15i8.2322

Copyright

© 2022 Yadava et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)