Confusion Matrix Analysis of Syllable-Like Unit Extracted from Hindi Continuous Speech

Archana Balyan

doi:10.17485/ijst/2017/v10i19/113607

Article

Confusion Matrix Analysis of Syllable-Like Unit Extracted from Hindi Continuous Speech

VIEWS 859
PDF 1841

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2017/v10i19/113607

Year: 2017, Volume: 10, Issue: 19, Pages: 1-6

Original Article

Confusion Matrix Analysis of Syllable-Like Unit Extracted from Hindi Continuous Speech

Archana Balyan^*

Maharaja Surajmal Institute of Technology, Affiliated to GGSIPU, Janak Puri − 110058, New Delhi, India; [email protected]

*Author for the correspondence:
Archana Balyan
Maharaja Surajmal Institute of Technology, Affiliated to GGSIPU, Janak Puri − 110058, New Delhi, India; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: Speech segmentation is important as a pre-processing task to improve the quality of TTS, and therefore an important field of research. The basic requirement is to build segmented and labeled speech corpora. The paper describes the database, the segmentation technique and confusion matrix analysis of segmentation of syllable structures occurring in the database. Methods: A set of 115 sentences, spoken by single male speaker chosen for providing information to passengers of Delhi metro rail are recorded. The analysis of speech database shows that syllable structures, namely, CV and CVC are most highly distributed covering 57.38% and 33.8% respectively and structures of CCVC, C C, CVCC ( where C=consonant, V=vowel and ‘ ’ represents nasalized- vowel sound)covers less than 2% in our database. The recorded speech sentences are segmented into syllable-like unit by using group-delay Algorithm. The segmentation technique has been tested with speech data of 704 tokens of syllables occurring in our database. The evaluation has been undertaken under two sections - the segmentation algorithm and the human perception approach. Findings: The quality of the segmentation evaluated using syllabic confusion matrix for various syllables structures demonstrate segmentation accuracy rate of 79%, 88.5%, for CV and CVC respectively. The overall accuracy of segmentation achieved on for metro rail passenger information system (MRPIS) task was 80.6%. Applications: The database be used in speech synthesizers and speech recognizers.

Keywords: Database, Delhi Metro Rail Corporation, Group- Delay Algorithm, Metro Rail Passenger Information Corpus, Segmentation, Syllable