System for Fusion of Face and Speech Modalities Using DTCWT+QFT and MFCC+RASTA Techniques

H C Shanthakumar; G S Nagaraja; Mustafa Basthikodi

doi:10.17485/IJST/v14i42.1316

Article

System for Fusion of Face and Speech Modalities Using DTCWT+QFT and MFCC+RASTA Techniques

VIEWS 1442
PDF 269

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v14i42.1316

Year: 2021, Volume: 14, Issue: 42, Pages: 3144-3156

Original Article

System for Fusion of Face and Speech Modalities Using DTCWT+QFT and MFCC+RASTA Techniques

H C Shanthakumar¹, G S Nagaraja², Mustafa Basthikodi^3*

¹Computer Science and Engineering, SJBIT, (Research Scholar, Jain University), Bengaluru, Karnataka, India
²Computer Science and Engineering, RV College of Engineering (IEEE Senior Member), Bengaluru, Karnataka, India
³Computer Science and Engineering, Sahyadri College of Engineering & Management, Mangaluru, Karnataka, India

*Corresponding Author
Email: [email protected]

Received Date:16 July 2021, Accepted Date:19 November 2021, Published Date:10 December 2021

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: The main objective is to propose a multimodal biometric system by forming a fusion of Face and Speech modalities using DTCWT+QFT techniques for face and MFCC+RASTA Techniques for Speech recognitions. The experimental results are compared with existing works and analysed the performance with counterparts. Methods: The proposed model, make use of DTCWT and QFT techniques to extract the features of face images and perform fusion of both. The MFCC and RASTA techniques are implemented to extract features of speech data and then fusion is applied. Various databases discussed and utilized for both face and speech recognition system proposed. Findings: The results of experimentation are compared with existing systems and analysis proved than the proposed system is placed in better position. The fusion of DTCWT and QFT techniques for face recognition system is implemented and the results using performance parameters such as False Acceptation Ratio (FAR), False Rejection Ratio (FRR), Total Success Rate (TSR), Partial Error Rate (PER), Equal Error Rate (EER) are tabulated for six different types of face data sets. The average performance of the results is compared with four existing fusion techniques and showed that the proposed system performs better. The fusion of MFCC and RASTA techniques for speech recognition system is implemented and the performance is measured by calculating accuracy, precision, recall and F1-score. These results are compared with five different schemes and proved that proposed system of fusion of face and speech traits works better for human recognitions. Novelty: Fusion of two algorithms for face recognition is implemented and the results analysed. Then the fusion of two algorithms for speech recognition is implemented and the results are analysed. The novel approach is presented to combine both face and speech recognition system in to single system to improve the security using multimodal biometrics.

Keywords: DTCWT; QFT; RASTA; MFCC; Feature Extraction; Fusion

References

Subramanian G, Cholendiran N, Prathyusha K, Balasubramanain N, Aravinth J. Multimodal Emotion Recognition Using Different Fusion Techniques. 2021 Seventh International conference on Bio Signals, Images, and Instrumentation (ICBSII). 2021;p. 1–6. doi: 10.1109/ICBSII51839.2021.9445146
Saste ST, Jagdale SM. Emotion recognition from speech using MFCC and DWT for security system. 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA). 2017;1:701–704. doi: 10.1109/ICECA.2017.8203631
Attawibulkul S, Kaewkamnerdpong B, Miyanaga Y. Noisy speech training in MFCC-based speech recognition with noise suppression toward robot assisted autism therapy. 2017 10th Biomedical Engineering International Conference (BMEiCON). 2017;p. 1–5. doi: 10.1109/BMEiCON.2017.8229135
Assuncao G, Goncalves N, Menezes P. Bio-Inspired Modality Fusion for Active Speaker Detection. Applied Sciences. 2021;11:3397. Available from: https://doi. org/10.3390/app11083397
Liu D, Wang Z, Wang L, Chen L. Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning. Frontiers in Neurorobotics. 2009;15:8300695. doi: 10.3389/fnbot.2021.697634
Zheng C, Wang C, Jia N. Emotion Recognition Model Based on Multimodal Decision Fusion. Journal of Physics: Conference Series. 2021;1873(1):012092. Available from: https://dx.doi.org/10.1088/1742-6596/1873/1/012092
Xie B, Sidulova M, Park CH. Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality the title Fusion. Sensors. 2021;21(14):4913. Available from: https://dx.doi.org/10.3390/s21144913
Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K. Speech Recognition Using Deep Neural Networks: A Systematic Review. IEEE Access. 2019;7:19143–19165. Available from: https://dx.doi.org/10.1109/access.2019.2896880
Anggraeni D, Sanjaya WSM, Nurasyidiek MYS, Munawwaroh M. The Implementation of Speech Recognition using Mel-Frequency Cepstrum Coefficients (MFCC) and Support Vector Machine (SVM) method based on Python to Control Robot Arm. IOP Conference Series: Materials Science and Engineering. 2018;288:012042. Available from: https://dx.doi.org/10.1088/1757-899x/288/1/012042
Khusainov AF. Language Models Creation for the Tatar Speech Recognition System. Indian Journal of Science and Technology. 2017;10(1). Available from: https://dx.doi.org/10.17485/ijst/2017/v10i1/109954
Adjabi I, Ouahabi A, Benzaoui A, Taleb-Ahmed A. Past, Present, and Future of Face Recognition: A Review. Electronics. 2020;10(1):2020. doi: 10.3390/electronics9081188
Shanthakumar HC, Nagaraja GS, Basthikodi M. Performance Evolution of Face and Speech Recognition system using DTCWT and MFCC Features. Turkish Journal of Computer and Mathematics Education (TURCOMAT). 2021;12(3):3395–3404. Available from: https://dx.doi.org/10.17762/turcomat.v12i3.1603
Maruf MR, Faruque MO, Mahmood S, Nelima NN, Muhtasim MG, Pervez MJA. Effects of Noise on RASTA-PLP and MFCC based Bangla ASR Using CNN. 2020 IEEE Region 10 Symposium (TENSYMP). 2020;p. 1564–1567. doi: 1109/TENSYMP50017.2020.9231034
Helali W, Hajaiej Ζ, Cherif A. Real Time Speech Recognition based on PWP Thresholding and MFCC using SVM. Engineering, Technology & Applied Science Research. 2020;10(5):6204–6208. Available from: https://dx.doi.org/10.48084/etasr.3759
Hidayat R, Bejo A, Sumaryono S, Winursito A. Denoising Speech for MFCC Feature Extraction Using Wavelet Transformation in Speech Recognition System. 2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE). 2018;p. 280–284. doi: 10.1109/ICITEED.2018.8534807
Tamazin M, Gouda A, Khedr M. Enhanced Automatic Speech Recognition System Based on Enhancing Power-Normalized Cepstral Coefficients. Applied Sciences. 2019;9(10):2166. Available from: https://dx.doi.org/10.3390/app9102166
Raju K, Krishna A, Murali M. Automatic Speech Recognition System Using Mfcc-Based Lpc Approach with Back Propagated Artificial Neural Networks. ICTACT Journal on Soft Computing. 2020;10(4). doi: 10.9790/4200-0606024864
Basthikodi M, Ahmed W. Parallel Algorithm Performance Analysis using OpenMP for Multicore Machines. International Journal of Advanced Computer Technology (IJACT). 2015;4(5):28–32. Available from: https://www.ijact.org/ijactold/volume4issue5/IJ0450005.pdf
Bousnina N, Ghouzali S, Mikram M, Abdul W. DTCWT-DCT watermarking method for multimodal biometric authentication. Proceedings of the 2nd International Conference on Networking, Information Systems & Security - NISS19. 2019;19. Available from: https://www.techscience.com/iasc/v27n1/41145/pdf
Shruthi M, Mustafa, Ananth Prabhu. Parellel Implementation of Modified Apriori Algorithm on Multicore Systems. ORALNDO, USA. 2016.
Ma Y, Huang Z, Wang X, Huang K. An Overview of Multimodal Biometrics Using the Face and Ear. Mathematical Problems in Engineering. 2020;2020:1–17. Available from: https://dx.doi.org/10.1155/2020/6802905
Sarangi PP, Nayak DR, Panda M, Majhi B. A feature-level fusion based improved multimodal biometric recognition system using ear and profile face. Journal of Ambient Intelligence and Humanized Computing. 2021. Available from: https://dx.doi.org/10.1007/s12652-021-02952-0
Tomar P, Singh RC. Cascade‐based Multimodal Biometric Recognition System with Fingerprint and Face. Macromolecular Symposia. 2021;397(1):2000271. Available from: https://dx.doi.org/10.1002/masy.202000271
Spacek L. Libor Spacek's Facial Images Databases. 2009. Available from: https://cmp.felk.cvut.cz/~spacelib/faces/
Siddiqui MF, Siddique WA, MA, Jumani AK. Face Detection and Recognition System for Enhancing Security Measures Using Artificial Intelligence System. Indian Journal of Science and Technology. 2020. doi: 10.17485/ijst/2020/v13i09/149298
Yale Database. Available from: http://vision.ucsd.edu/~iskwak/ExtYaleDatabase/ExtYaleB.html
Happy SL, Dasgupta A, George A, Routray A. A video database of human faces under near Infra-Red illumination for human computer interaction applications. 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI). 2012;p. 1–4. doi: 10.1109/IHCI.2012.6481868
The ORL Database of Faces. Available from: http://www.face-rec.org/databases/
Panayotov V, Chen G, Povey D, Khudanpur S. Librispeech: An ASR corpus based on public domain audio books. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015;p. 5206–5210. doi: 10.1109/ICASSP.2015.7178964
Halvi S, Ramapur N, Raja KB, Prasad S. Fusion Based Face Recognition System using 1D Transform Domains. Procedia Computer Science. 2017;115:383–390. doi: 10.1016/j.procs.2017.09.095
Sujatha BM. SOM based Face Recognition using Steganography and DWT Compression Techniques. International Journal of Computer Science and Information Security. 2016;14(9):806–826. doi: 10.5121/sipij.2016.7304
Sujatha BM, Madiwalar CT, Babu KS, Raja KB, Venugopal KR. Compression Based Face Recognition Using DWT and SVM. An International Journal (SIPIJ). 2016;7(3):45–62. doi: 10.5121/sipij.2016.7304
Sujatha BM, Lagali S, Ramapur N, Babu KS, Raja KB, Venugopal KR. Reversible Logic-MUX-Multiplier Based Face Recognition using Hybrid Features. IOSR Journal of VLSI and Signal Processing. 2016;6(6):48–64. Available from: http://www.iosrjournals.org/iosr-jvlsi/papers/vol6-issue6/Version-2/F0606024864.pdf
Belahcene M, Laid M, Chouchane A, Ouamane A, Bourennane S. Local descriptors and tensor local preserving projection in face recognition. 2016 6th European Workshop on Visual Information Processing (EUVIP). 2016. doi: 10.1109/EUVIP.2016.7764608
Maza S, Touahria M. Feature Selection Algorithms in Intrusion Detection System: A Survey. KSII Transactions on Internet and Information Systems. 2018;12(10):1–14. doi: 10.3837/tiis.2018.10.024
Chen K, Zhou FY, Yuan XF. Hybrid particle swarm optimization with spiral-shaped mechanism for feature selection. Expert Systems with Applications. 2019;128:140–156. Available from: https://dx.doi.org/10.1016/j.eswa.2019.03.039
Khalvati L, Keshtgary M, Rikhtegar N. Intrusion Detection Based on a Novel Hybrid Learning Approach”. Journal of AI and Data Mining. 2018;6(1):157–162. doi: 10.22044/JADM.2017.979
Acharya N, Singh S. An IWD-based feature selection method for intrusion detection system. Soft Computing. 2018;22(13):4407–4416. Available from: https://dx.doi.org/10.1007/s00500-017-2635-2

Copyright

© 2021 Shanthakumar et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)