Development of Speaker-Independent Automatic Speech Recognition System for Kannada Language

Praveen Kumar; H S Jayanna

doi:10.17485/IJST/v15i8.2322

Article

Development of Speaker-Independent Automatic Speech Recognition System for Kannada Language

VIEWS 1329
PDF 338

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v15i8.2322

Year: 2022, Volume: 15, Issue: 8, Pages: 333-342

Original Article

Development of Speaker-Independent Automatic Speech Recognition System for Kannada Language

Praveen Kumar^1*, H S Jayanna²

¹Research Scholar, Department of ECE, Siddaganga Institute of Technology, Tumakuru, 572103, Karnataka, India
²Department of ISE, Siddaganga Institute of Technology, Tumakuru, 572103, Karnataka, India

*Corresponding Author
Email: [email protected]

Received Date:12 November 2021, Accepted Date:22 January 2022, Published Date:02 March 2022

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: The primary goal is to address attempts to establish a Continuous Speech Recognition (CSR) framework for recognising continuous speech in Kannada. It is a difficult challenge to deal with a local language such as Kannada, which lacks the resources of a single language database. Methods: Modelling techniques such as monophone, triphone, deep neural network (DNN)-hidden Markov model (HMM) and Gaussian Mixture Model (GMM)- HMM-based models were implemented in Kaldi toolkit and used for continuous Kannada speech recognition (CKSR). To extract feature vectors from speech data, the Mel frequency Cepstral (MFCC) coefficient technique is used. The continuous Kannada speech database consists of 2800 speakers (1680 males and 1120 females) belong to the age group 8 years to 80 years. The training and testing data are in the ratio 80:20. In this paper the hybrid modelling techniques are implemented to recognize the spoken words. Findings: The model efficiency is determined based on the word error rate (WER) and the obtained results are assessed with the well-known datasets such as TIMIT and Aurora-4. This study found that using Kaldi-based features ex- traction recipes for monophone, triphone, DNN-HMM and GMM-HMM acoustic models had a word error rate (WER) of 8.23%, 5.23%, 4.05% and 4.64% respectively. The experimental results suggest that the rate of recognition of Kannada speech data has increased higher than that of state-of-the-art databases. Novelty : We propose a novel automatic speech recognition system for Kannada language. The main reason for developing the automatic speech recognition system for Kannada language is that there are only limited sources of standard continuous Kannada speech are available. We created large vocabulary Kannada database. We implemented monophone, triphone, Subspace Gaussian mixture model (SGMM) and hybrid modelling techniques to develop the automatic speech recognition system for Kannada language.

Keywords: DNN; Continuous speech; HMM; Kannada dialect; Kaldi toolkit; monophone; triphone; WER

References

Kumar P, Jayanna PS, HS. Creation and Instigation of Triphone based Big-Lexicon Speaker-Independent Continuous Speech Recognition Framework for Kannada Language. International Journal of Innovative Technology and Exploring Engineering. 2019;9(2S):152–158. doi: 10.35940/ijitee.b1090.1292s19
Guglani J, Mishra AN. Continuous Punjabi speech recognition model based on Kaldi ASR toolkit. International Journal of Speech Technology. 2018;21(2):211–216. Available from: https://dx.doi.org/10.1007/s10772-018-9497-6
Kalamani M, Krishnamoorthi M, Valarmathi RS. Continuous Tamil Speech Recognition technique under non stationary noisy environments. International Journal of Speech Technology. 2019;22(1):47–58. Available from: https://dx.doi.org/10.1007/s10772-018-09580-8
Upadhyaya P, Farooq O, Abidi MR, Varshney YV. Continuous hindi speech recognition model based on Kaldi ASR toolkit. 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). 2017;p. 786–789. doi: 10.1109/WiSPNET.2017.8299868
Sharma RS, Paladugu SH, Priya KJ, Gupta D. Speech Recognition in Kannada using HTK and Julius: A Comparative Study. 2019 International Conference on Communication and Signal Processing (ICCSP). 2019;p. 68–0072. doi: 10.1109/ICCSP.2019.8698039
Amin MAA, Islam MT, Kibria ST, Rahman MS. Continuous Bengali Speech Recognition Based On Deep Neural Network. 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE). 2019;p. 1–6. doi: 10.1109/ECACE.2019.8679341
Upadhyaya P, Mittal SK, Farooq O, Varshney YV, Abidi MR. Continuous Hindi speech recognition using Kaldi ASR based on deep neural network”. Machine Intelligence and Signal Analysis. 2019;p. 303–311.
Kipyatkova I, Karpov A. DNN-Based Acoustic Modeling for Russian Speech Recognition Using Kaldi. Speech and Computer. 2016;p. 246–253.
Madhavaraj A, Ramakrishnan AG. Design and development of a large vocabulary, continuous speech recognition system for Tamil. 2017 14th IEEE India Council International Conference (INDICON). 2017;p. 1–5.
Speech and Language Processing. In: Jurafsky D, Martin JH., eds. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (Third Edition draft). 2020.
Kumar PSP, Yadava GT, Jayanna HS. Continuous Kannada Speech Recognition System Under Degraded Condition. Circuits, Systems, and Signal Processing. 2020;39:391–419. Available from: https://dx.doi.org/10.1007/s00034-019-01189-9

Copyright

© 2022 Kumar & Jayanna. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Published By Indian Society for Education and Environment (iSee)