• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: 4, Pages: 282-291

Original Article

Low Resource Kannada Speech Recognition using Lattice Rescoring and Speech Synthesis

Received Date:10 December 2022, Accepted Date:05 January 2023, Published Date:31 January 2023

Abstract

Objectives: Improving the accuracy of low resource speech recognition in a model trained on only 4 hours of transcribed continuous speech in Kannada language, using data augmentation. Methods: Baseline language model is augmented with unigram counts of words, that are present in the Wikipedia text corpus but absent in the baseline, for initial decoding. Lattice rescoring is then applied using the language model augmented with Wikipedia text. Speech synthesis-based augmentation with multi-speaker syllable-based synthesis, using voices in Kannada and cross-lingual Telugu languages, is employed. We synthesize basic syllables, syllables with consonant conjuncts, and words that contain syllables that are absent in the training speech, for Kannada language. Findings: An overall word error rate (WER) of 9.04% is achieved over a baseline WER of 40.93%. Language model augmentation and lattice rescoring gives an absolute improvement of 16.68%. Applying our method of syllable-based speech synthesis over language model augmentation and rescoring yields a total reduction of 31.89% in WER. The proposed approach of language model augmentation is memory efficient and consumes only 1/8th the memory required for decoding with Wikipedia augmented language model (2 gigabytes versus 18 gigabytes) while giving comparable WER (22.95% for Wikipedia versus 24.25% for our method). Augmentation with synthesized syllables enhances the ability of the speech recognition model to recognize basic sounds thus improving recognition of out-of-vocabulary words to 90% and in-vocabulary words to 97%. Novelty: We propose novel methods of language model augmentation and synthesis-based augmentation to achieve low WER for a speech recognition model trained on only 4 hours of continuous speech. Obtaining high recognition accuracy (or low WER) for very small speech corpus is a challenge. In this paper, we demonstrate that high accuracy can be achieved using data augmentation for a small corpus-based speech recognition.
Keywords: Low resource; Speech synthesis; Data augmentation; Language model; Lattice rescoring

References

  1. Pratap V, Sriram A, Tomasello P, Hannun A, Liptchinsky V, Synnaeve G, et al. Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters. Electrical Engineering and Systems Science. 2020. Available from: https://doi.org/10.48550/arXiv.2007.03001
  2. Srivastava BML, Sitaram S, Mehta RK, Mohan KD, Matani P, Satpal S, et al. Interspeech 2018 Low Resource Automatic Speech Recognition Challenge for Indian Languages. 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018). 2018;p. 11–15. Available from: https://www.microsoft.com/en-us/research/publication/interspeech-2018-low-resource-automatic-speech-recognition-challenge-for-indian-languages/
  3. Sethi N, Dev A. Survey on Automatic Speech Recognition Systems for Indic Languages. Artificial Intelligence and Speech Technology. 2022;p. 85–98. Available from: https://doi.org/10.1007/978-3-030-95711-7_8
  4. Yadava GT, Jayanna HS. Automatic Isolated Kannada Speech Recognition System under Degraded Conditions. International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques. 2019;p. 146–150. Available from: https://doi.org/10.1109/iceeccot46775.2019.9114658
  5. Student P, Professor A, , , , . Kannada Speech Segmentation And Recognition For Speech To Text Conversion. International Journal of Creative Research. 2019;8(6):2320–2882. Available from: https://doi.org/10.1109/ICEECCOT46775.2019.9114658
  6. Kumar P, Jayanna HS. Development of Speaker-Independent Automatic Speech Recognition System for Kannada Language. Indian Journal of Science and Technology. 2022;15(8):333–342. Available from: https://doi.org/10.17485/IJST/v15i8.2322
  7. Chellapriyadharshini M, Toffy A, M. SRK, Ramasubramanian V. Semi-supervised and Active-learning Scenarios: Efficient Acoustic Model Refinement for a Low Resource Indian Language. Interspeech . 2018;p. 1041–1046. Available from: https://doi.org/10.48550/arXiv.1810.06635
  8. Zhang X, Povey D, Khudanpur S. OOV Recovery with Efficient 2nd Pass Decoding and Open-vocabulary Word-level RNNLM Rescoring for Hybrid ASR. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2020;p. 6334–6342. Available from: https://doi.org/10.1109/ICASSP40776.2020.9053872
  9. Chen Z, Rosenberg A, Zhang Y, Wang G, Ramabhadran B, Moreno PJ. Improving Speech Recognition Using GAN-Based Speech Synthesis and Contrastive Unspoken Text Selection. Interspeech . 2020;p. 556–560. Available from: https://doi.org/10.21437/Interspeech.2020-1475
  10. Pusateri E, Gysel CV, Botros R, Badaskar S, Hannemann M, Oualil Y, et al. Connecting and Comparing Language Model Interpolation Techniques. Interspeech . 2019. Available from: https://doi.org/10.48550/arXiv.1908.09738

Copyright

© 2023 Murthy & Sitaram. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.