Comparative Analysis of Kannada Formant Synthesized Utterances and their Quality

Alfred Vivek D rsquo Souza; D J Ravi

doi:10.17485/IJST/v16i5.2091

Article

Comparative Analysis of Kannada Formant Synthesized Utterances and their Quality

VIEWS 750
PDF 148

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v16i5.2091

Year: 2023, Volume: 16, Issue: 5, Pages: 309-317

Original Article

Comparative Analysis of Kannada Formant Synthesized Utterances and their Quality

Alfred Vivek D’Souza^1*, D J Ravi²

¹Research Scholar, Department of ECE, Vidyavardhaka College of Engineering, Mysuru, Karnataka, India
²Research Supervisor, Department of ECE, Vidyavardhaka College of Engineering, Mysuru, Karnataka, India

*Corresponding Author
Email: [email protected]

Received Date:27 October 2022, Accepted Date:16 December 2022, Published Date:04 February 2023

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: The goal of this work is to synthesize Kannada utterances using a modified Klatt type formant synthesizer to evaluate its performance by comparing against eSpeak synthesizer in terms of intelligibility and quality of the utterances generated. Methods: Kannada utterances viz., vowels, diphthongs, Consonant-Vowel (CV) coarticulations and simple words are generated using a modified Klatt type formant synthesizer and eSpeak. The vowels and diphthongs generated by both the synthesizers are compared with natural recorded utterances using F1-F2 formants and the CV co-articulations are compared using spectrograms. The synthesized word utterances are compared with natural recorded utterances using Log Spectral Distance to find out which synthesizer outputs the frequency spectrum that is closest to the frequency spectrum of the natural utterances. Also, the synthesized word utterances are evaluated for their intelligibility and quality using Mean Opinion Score (MOS) obtained from 10 native Kannada language speakers. Findings: The word utterances synthesized by the modified Klatt type formant synthesizer scored a MOS of 86% and 4.46 out of 5 for the parameters of intelligibility and quality whereas for the same two parameters eSpeak scored 70% and 4.14 out of 5 respectively. Novelty: Klatt type formant synthesizer that uses pitch synchronous parameter update method synthesizes good quality Kannada sound utterances and storing the control parameters of the synthesizer using polynomials reduces the database footprint.

Keywords: Kannada Formant Synthesizer; Klatt type Synthesizer; eSpeak; Kannada TTS; Formant synthesis quality

References

Koffi E, Petzold M. A Tutorial on Formant-based Speech Synthesis for the Documentation of Critically Endangered Languages. 2022. Available from: https://repository.stcloudstate.edu/stcloud_ling/vol11/iss1/3
Trivedi A, Pant N, Shah P, Sonik S, Agrawal S. Speech to text and text to speech recognition systems-A review. IOSR Journal of Computer Engineering. 2018;20(2):36–43. Available from: https://www.iosrjournals.org/iosr-jce/papers/Vol20-issue2/Version-1/E2002013643.pdf
Dutonde SK, Mapari GS, Wagh SJ, Kapse A. Review on Text to Speech Synthesizer. International Journal of Advance Research and Innovative Ideas in Education. 2022;8(3):592–596. Available from: https://ijariie.com/AdminUploadPdf/Review_on_Text_to_Speech_Synthesizer_ijariie16614.pdf
Li X, Ma D, Yin B. Advance research in agricultural text-to-speech: the word segmentation of analytic language and the deep learning-based end-to-end system. Computers and Electronics in Agriculture. 2021;180:105908. Available from: https://doi.org/10.1016/j.compag.2020.105908
Tan X, Qin T, Soong F, Liu TY. A survey on neural speech synthesis. 2021. Available from: https://arxiv.org/pdf/2106.15561.pdf
Kuligowska K, Kisielewicz P, Włodarz A. Speech synthesis systems: disadvantages and limitations. International Journal of Engineering & Technology. 2018;7(2.28):234. Available from: https://doi.org/10.14419/ijet.v7i2.28.12933
Sen A. Speech Synthesis in India. 2007. Available from: https://www.tandfonline.com/doi/abs/10.4103/02564602.10876616
Panda SP, Nayak AK, Rai SC. A survey on speech synthesis techniques in Indian languages. Multimedia Systems. 2020;26(4):453–478. Available from: https://doi.org/10.1007/s00530-020-00659-4
Hillenbrand JM. The acoustics and perception of North American English vowels. 2019. Available from: https://www.taylorfrancis.com/chapters/edit/10.4324/9780429056253-10/acoustics-perception-north-american-english-vowels-james-hillenbrand
Lukose S, Upadhya SS. Text to speech synthesizer-formant synthesis. 2017 International Conference on Nascent Technologies in Engineering (ICNTE). 2017;p. 1–4. Available from: https://doi.org/10.1109/ICNTE.2017.7947945
D’souza AV, Ravi DJ. An Approach for Formant Synthesis of Kannada. Journal of Signal Processing. 2022;8(2):31–38. Available from: https://doi.org/10.46610/JOSP.2022.v08i02.006
Duddington J, Dunn R. eSpeak text to speech. Available from: http://espeak.sourceforge.net.2012.http://espeak.sourceforge.net

Copyright

© 2023 D’Souza & Ravi. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)