• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: 19, Pages: 1413-1421

Original Article

HiTEK Pre-processing for Speech and Text: NLP

Received Date:14 February 2023, Accepted Date:22 April 2023, Published Date:12 May 2023


Objective: To develop a system that accepts a sentence consisting of two and/or four languages and convert it to a target language text, termed as Cross Language Speech Identification and Text Translation System. Methods: A combinatorial model consisting of Hidden Markov Model, Artificial Neural Networks, Deep Neural Networks and Gaussian Mixture Model are utilized for direct and indirect speech mapping. Trained dataset consisting of thousand phonemes for each of the Hindi, Telugu, English and Kannada languages, initially for bank, hospital domains, later the grammatical phonemes of each language were added and wave files consisting of cross lingual spoken sentence were created which incurred a six months period to build from scratch, as cross lingual vocal data-set is not available. Hindi language dataset Shabdanjali was also referred. The basic parameters considered for creation of structured dataset are loudness, pause, pitch, tone, noise cancellation, sampling frequency, threshold etc. Findings: Comparative analysis of various techniques, target languages and features are tabulated. Research idea emerged from the comparative analysis of Monolingual Systems where there was a gap for cross lingual speech to text translation. The architecture can be enhanced in future for other regional languages of India. Novelty: A new bench mark for Cross Language dataset was created. This work presents CLSITT tool applicable in transforming public speeches spoken in multiple languages to a selected target language and the tool is helpful for a regional news editor, rural and agricultural activities, medical applications, defence and so on.

Keywords: Artificial Intelligence (AI); Deep Learning (DL); Machine Learning (ML); Natural Language Processing (NLP)


  1. Nagarhalli TP, Mhatre S, Patil S, Patil P. The Review of Natural Language Processing Applications with Emphasis on Machine Learning Implementations. 2022 International Conference on Electronics and Renewable Systems (ICEARS). 2022;p. 1353–1358. Available from: https://doi.org/10.1109/ICEARS53579.2022.9752326
  2. Khurana D, Koli A, Khatter K, Singh S. Natural language processing: state of the art, current trends and challenges. Multimedia Tools and Applications. 2023;82(3):3713–3744. Available from: https://doi.org/10.1007/s11042-022-13428-4
  3. Banane M, Erraissi A. A comprehensive study of Natural Language processing techniques Based on Big Data. 2022 International Conference on Decision Aid Sciences and Applications (DASA). 2022;p. 1492–1497. Available from: https://doi.org/10.1109/DASA54658.2022.9765270
  4. Nandwani P, Verma R. A review on sentiment analysis and emotion detection from text. Social Network Analysis and Mining. 2021;11(1):81. Available from: https://doi.org/10.1007/s13278-021-00776-6
  5. Advaith V, Shivkumar A, Lakshmi BSS. Parts of Speech Tagging for Kannada and Hindi Languages using ML and DL models. 2022 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT). 2022. Available from: https://doi.org/10.1109/CONECCT55679.2022.9865745
  6. Janda HK, Pawar A, Du S, Mago V. Syntactic, Semantic and Sentiment Analysis: The Joint Effect on Automated Essay Evaluation. IEEE Access. 2019;7:108486–108503. Available from: https://doi.org/10.1109/ACCESS.2019.2933354


© 2023 Rudrappa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee


Subscribe now for latest articles and news.