Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: 44, Pages: 1-7
N. Radha1 *, A. Shahina1 and A. Nayeemulla Khan2
1Department of Information Technology, SSN College of Engineering, Chennai, India; radhan, [email protected] 2School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India; [email protected]
*Author for correspondence
Department of Information Technology, SSN College of Engineering, Chennai, India; radhan,[email protected]
Objectives: This paper proposes a method to improve the performance of a Visual Speech Recognition (VSR) system by combining the pixel-based and geometry-based features, so as to augment the performance of audio based Automatic Speech Recognition (ASR) systems in adverse conditions. Methods/Statistical Analysis: A video database comprising of 11000 utterances of isolated words, collected from 20 speakers, is used in this study. Pixel based features (DCT and DWT) and geometric features (Active Shape Model or ASM) are fused at two levels, one at the feature level and the other at the decision level. A simple Gaussian mixture HMM word model is built for feature level fusion, while a two stream HMM model is built for decision level fusion. Findings: The VSR system built using the combined features shows a significant improvement in performance when compared to individual VSR systems built using pixel and geometric based features. The accuracy of the individual system is 76% for geometric features, 64% for DCT and 72% for DWT pixel-based features. The performance improves for combined features with an accuracy of 80% for ASM+DCT and 84.7% for DWT+ASM. A weighted decision level fusion result in further improvement, with an accuracy of 84% for ASM+DCT and 92% for ASM+DWT. Application/Improvements: The combined VSR could be preferred over individual pixel/geometric feature based systems to augment the performance of audio based Automatic Speech Recognition (ASR) systems in adverse conditions. Further studies on improving the VSR system, which could be used in lieu of audio-based ASR systems in adverse situations, are being carried out.
Keywords: HMM, Pixel and Geometric Features, Visual Speech Recognition
Subscribe now for latest articles and news.