• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2017, Volume: 10, Issue: 1, Pages: 1-5

Original Article

Language Models Creation for the Tatar Speech Recognition System


Objectives: The article presents the experiments on the creation of different language models for the Tatar language. N-gram statistical models are used with five different smoothing techniques. Methods: These models can be used in various applications: machine translation systems, spell checking, etc. The study intended to use the patterns in the system of Tatar speech automatic recognition. Taking into account the specifics of the Tatar language, consisting in a rich morphology, speech recognition systems may use not only words but also the building blocks of words as basic modeling units: syllables, morphemes, etc Finding: The following essential elements were chosen for a complete analysis of Tatar language models development: word, morpheme, morph (statistically selected component of a nutshell), the stem and affix chain, syllable and letter. Thus, some models constructed for all combinations of 2-, 3-, 4-grams, smoothing techniques and essential elements of the language. Besides, an experiment showing the possibility of a language model development based on word classes conducted. Conclusion: According to performed experiment results the conclusions are made about the quality of the Tatar language grammar description, the degree of coverage lexicon, and required vocabulary volume for each type of constructed models.

Keywords: Automatic Speech Recognition, Class-Based Models, Language Model, N-Grams, Tatar Language


Subscribe now for latest articles and news.