Indian Journal of Science and Technology
Year: 2015, Volume: 8, Issue: 20, Pages: 1-10
M. A. Yusnita1*, M. P. Paulraj2 , Sazali Yaacob3 , R. Yusuf1 and M. Nor Fadzilah1
1 Faculty of Electrical Engineering, Universiti Teknologi MARA Malaysia, Permatang Pauh - 13500, Penang, Malaysia; [email protected]
2 School of Mechatronic Engineering, University Malaysia Perlis, Ulu Pauh - 02600, Perlis, Malaysia
3 Universiti Kuala Lumpur Malaysian Spanish Institute, Kulim Hitech Park, Kulim - 09000, Kedah, Malaysia
The standard speech feature extractors such as Mel-Frequency Cepstral Coefficients (MFCC) and Linear Prediction Coefficients (LPC) fail to perform well under noisy conditions. In this paper two noise less-susceptible features are proposed to mitigate the deficiency of MFCC and LPC. Statistical descriptors of Mel-Bands Spectral Energy (MBSE) is applied to the traditional filter-bank analysis, however, this technique increases the feature size. This issue is tackled by proposing a transformation using principle component analysis to generate a new PCA-MBSE feature set. Two types of utterances namely isolated words and continuous speech were elicited from 103 university volunteers in a controlled room to collect speech signals from three main ethnic groups in Malaysia. This study employed two classifiers namely K-nearest neighbors and artificial neural networks to recognize between the Malay, Chinese and Indian accents. Experimental results using independent test samples technique indicated promising accuracy rates of 92.7% and 93.0% using the proposed PCAMBSE features to recognize between the Malay, Chinese and Indian accents on the male and female datasets respectively. It was found that under severe noisy conditions, the standard MFCC and LPC features started to deteriorate faster than the MBSE-based features. PCA-MBSE features showed the most robust quality where its performance was just slightly deteriorated by 17.1% and 13.6% as compared to MBSE features i.e. 33.1% and 31.3% on the male and female datasets respectively. Further poor results of LPC features were obtained indicating deterioration rates of 40.2% and 32.7%, while that of MFCC features of 35.7% and 36.8% for the male and female datasets respectively. As a conclusion, Malaysian English is a not a uniform English variety colored by its diverse ethnic nuances. Incorporating accent analyzers using the proposed techniques in automatic speech recognition can contribute a substantial improvement in noisy environment.
Keywords: Accent Recognition, K-Nearest Neighbors, Linear Prediction Coefficients, Malaysian English, Mel-Bands Spectral Energy, Mel-Frequency Cepstral Coefficients, Principle Component Analysis
Subscribe now for latest articles and news.