Novel Transfer Learning Attitude for Automatic Video Captioning Using Deep Learning Models

J Vaishnavi; V Narmatha

doi:10.17485/IJST/v15i43.1846

Article

Novel Transfer Learning Attitude for Automatic Video Captioning Using Deep Learning Models

VIEWS 1072
PDF 722

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v15i43.1846

Year: 2022, Volume: 15, Issue: 43, Pages: 2325-2335

Original Article

Novel Transfer Learning Attitude for Automatic Video Captioning Using Deep Learning Models

J Vaishnavi^1*, V Narmatha²

¹Research scholar, Department of Computer & Information science, Annamalai University, Annamalai Nagar, Tamil Nadu, India
²Assiatant Professor, Department of Computer & Information science, Annamalai University, Annamalai Nagar, Tamil Nadu, India

*Corresponding Author
Email: [email protected]

Received Date:13 September 2022, Accepted Date:08 October 2022, Published Date:17 November 2022

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: To generate the captions for the videos with less time complexity and high accuracy and also to create captions for each input video frame with particular timestamps. It will be utilized in the crime branch and hearingimpaired people will learn about the happenings of the video fruitfully. Methods: The proposed approach experiments with Transfer learning techniques. Modified Inception v3 and Resnet 50 networks are designed to compare the results. The standard MSVD Dataset is utilized to demonstrate the architectures. The performances are compared with the standard performance metrics. Findings: The inception v3 model works better than the Resnet 50 architecture for video captioning tasks. It provides the best accuracy at 99.83% with captions for the given input videos than Resnet 50 model. The MSVD dataset is more suitable for the demonstration of the video captioning task. Novelty: The two proposed models are modified based on the working of the video captioning tasks. The aggregation of some layers boosts the performance of the models more than ordinary models.

Keywords: Artificial Intelligence; Automatic Captioning; Transfer Learning; Frames; Inception V3; Residual Network50 Model

References

Sasikala S, Ramesh S, Gomathi S, Balambigai S, Anbumani V. Transfer learning based recurrent neural network algorithm for linguistic analysis. Concurrency and Computation: Practice and Experience. 2022;34(5). Available from: https://doi.org/10.1002/cpe.6708
Amirian S, Rasheed K, Taha TR, Arabnia HR. Automatic Image and Video Caption Generation With Deep Learning: A Concise Review and Algorithmic Overlap. IEEE Access. 2020;8:218386–218400. Available from: https://doi:10.1109/ACCESS.2020.3042484
Padmawar P, Borade R, Hol A. Video Captioning Using Neural Networks. International Journal for Research in Applied Science and Engineering Technology. 2022;10(5):1228. Available from: https://doi.org/10.22214/ijraset.2022.42506
Tc, Phan AC, Phan HP, Cao TN, Trieu. Content-Based Video Big Data Retrieval with Extensive Features and Deep Learning. Applied Sciences. 2022;12:6753. Available from: https://doi.org/10.3390/app12136753
Amaresh S, Chitrakala. Video Captioning using Deep Learning: An Overview of Methods, Datasets and Metrics. 2019. Available from: https://doi: 10.1109/ICCSP.2019.8698097
Samleti S, Mishra A, Jhajhria A, Rai SK, Malik G. Real-Time Video Captioning Using Deep Learning. International Journal of Engineering Research & Technology (IJERT). 2021(12):360–366. Available from: https://doi:10.17577/IJERTV10IS120054
Ji W, Wang R. A Multi-instance Multi-label Dual Learning Approach for Video Captioning. ACM Transactions on Multimedia Computing, Communications, and Applications. 2021;17(2s):1–18. Available from: https://doi.org/10.1145/3446792
Eg, Özer IN, Karapınar S, Başbuğ S, Turan A, Utku MA, et al. Deep Learning based, a New Model for Video Captioning. International Journal of Advanced Computer Science and Applications. 2020;11(3). Available from: https://doi:10.14569/IJACSA.2020.0110365
Zhao H, Chen Z, Guo L, Han Z. Video captioning based on vision transformer and reinforcement learning. PeerJ Computer Science. 8:e916. Available from: https://doi:10.7717/peerj-cs.916
Malla M, Jafar A, Ghneim N. The image captioning model using attention and object features to mimic human image understanding. Journal of Big Data. 2022. Available from: https://doi.org/10.1186/s40537-022-00571-w
Hong R, Liu D, Mo X, He X, Zhang H. Learning to Compose and Reason with Language Tree Structures for Visual Grounding. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022;44(2):684–696. Available from: https://doi: 10.1109/TPAMI.2019.2911066
Hou J, Wu X, Zhao W, Luo J, Jia Y. Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 2019. Available from: https:doi:10.1109/ICCV.2019.00901
Wang B, Ma L, Zhang W, Jiang W, Wang J, Liu W. Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 2019. Available from: https://doi.org/10.48550/arXiv.1908.10072
Ma CY, Kalantidis Y, Alregib G, Vajda P, Rohrbach M, Kira Z. Learning to Generate Grounded Visual Captions Without Localization Supervision, Lecture Notes in Computer Science book series. LNIP. 2020;12363. Available from: https://doi.org/10.1007/978-3-030-58523-5_21
Rimle P, Dogan-Schonberger P, Gross M. Enriching Video Captions With Contextual Text. International Conference on Pattern Recognition (ICPR). 2021. Available from: https://doi: 10.1109/ICPR48806.2021.9412008
Islam S, Dash A, Seum A, Raj AH, Hossain T, Shah FM. Exploring video captioning techniques: A comprehensive survey on deep learning methods. SN Computer Science. 2021;2(2). Available from: https://doi.org/10.1007/s42979-021-00487-x

Copyright

© 2022 Vaishnavi & Narmatha. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)