A Computational Meta-Learning Inspired Model for Sketch-based Video Retrieval

N Pavithra; Y H Sharath Kumar

doi:10.17485/IJST/v16i7.2121

Article

A Computational Meta-Learning Inspired Model for Sketch-based Video Retrieval

VIEWS 708
PDF 116

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v16i7.2121

Year: 2023, Volume: 16, Issue: 7, Pages: 476-484

Original Article

A Computational Meta-Learning Inspired Model for Sketch-based Video Retrieval

N Pavithra^1*, Y H Sharath Kumar²

¹Research Scholar, Department of Information Science & Engineering , Maharaja Institute of Technology ,Visvesvaraya Technological University, Mysuru, India
²Professor and Head, Department of Information Science & Engineering , Maharaja Institute of Technology Visvesvaraya Technological University, Mysuru, India

*Corresponding Author
Email: [email protected]

Received Date:02 November 2022, Accepted Date:25 January 2023, Published Date:20 February 2023

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: To design and develop an efficient computing framework for sketch-based video retrieval using fine-grained intrinsic computational approach. Methods: The primary method of sketch-based video retrieval adopts multi-stream multi-modality of joint embedding method for improved P-SBVR from improved fine-grained KTH and TSF related dataset. It considers the potential aspects of the computation of significant visual intrinsic appearance details for sketch objects. The extracted appearance and motion-based features are used to train three different CNN baselines under strong and weak supervision. The system also implements a meta-learning model for different supervised settings to attain better performance of sketch-based video retrieval along with a relational module to overcome the problem of overfitting. Findings: The study derives specific sketch sequences from its formulated dataset to compute instance-level query processing for video retrieval. Further, it also addresses the limitations arising in the context of coarse-grained video retrieval models and sketch-based still image retrieval. The aggregated dataset for rich annotation assisted in the experimental simulation. The experimental evaluation with respect to the performance metric evaluates the 3D CNN baselines under strong supervision and weak-supervision where CNN BL-Type-2 attains maximum video retrieval accuracy of 99.96% for triplet grading feature under relational schema. CNN BL-Type-1 attains maximum retrieval accuracy of 97.40% considering the triplet grading features from the improved SBVR. The evaluation metric for the instance level retrieval process also considers true matching of sketches with the videos, it clearly shows that the appropriate appearance and motion based feature selection has enhanced the video retrieval accuracy up to 96.90% with 99.28% accuracy in action identification considering motion stream, 98.17% for appearance module and 98.45% for fusion module. Another important aspect of the proposed research context is that it addresses the problem of cross-modality while executing the simultaneous matching paradigm for visual appearances of the object with its movement appearing on particular video scenes. The experimental outcome showsits comparable effectiveness relative to the existing system of CNN. Novelty: Unlike the conventional system of sketch analysis, which is more focused on static objects or scenes, the presented approach can efficiently compute the important visual intrinsic appearance details of the object of interest from the sketch and then activate the operations for video retrieval. The proposed CNN based learning model with improved P-SBVR dataset attains better computing time for retrieval with are approximately (200, 210 and 214) milliseconds for CNN BL-Type-1, CNN BL-Type-2, CNN BL-Type-3 and comparable with the existing deep learning based SBVR models.

Keywords: Sketch Based Video Retrieval; Intrinsic Appearance Details; Meta Learning; Sketch Dataset; Cross Modality Problem

References

Araujo A, Girod B. Large-Scale Video Retrieval Using Image Queries. IEEE Transactions on Circuits and Systems for Video Technology. 2018;28(6):1406–1420. Available from: https://doi.org/10.1109/TCSVT.2017.2667710
Sheng B, Li P, Gao C, Ma KL. Deep Neural Representation Guided Face Sketch Synthesis. IEEE Transactions on Visualization and Computer Graphics. 2019;25(12):3216–3230. Available from: https://doi.org/10.1109/TVCG.2018.2866090
Xu P, Huang Y, Yuan T, Pang K, Song YZ, Xiang T, et al. SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018;p. 8090–8098. Available from: https://doi.org/10.1109/CVPR.2018.00844
Xu P, Yin Q, Huang Y, Song YZ, Ma Z, Wang L, et al. Cross-modal subspace learning for fine-grained sketch-based image retrieval. Neurocomputing. 2018. Available from: https://doi.org/10.48550/arXiv.1705.09888
Muhammad UR, Yang Y, Song YZ, Xiang T, Hospedales TM. Learning Deep Sketch Abstraction. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018;p. 8014–8023. Available from: https://doi.org/10.1109/CVPR.2018.00836
Madore KP, Jing HG, Schacter DL. Selective effects of specificity inductions on episodic details: evidence for an event construction account. Memory. 2019;27(2):250–260. Available from: https://doi.org/10.1080/09658211.2018.1502322
David H, Eck D. A neural representation of sketch drawings. 2017. Available from: https://doi.org/10.48550/arXiv.1704.03477
Jing T, Xia H, Hamm J, Ding Z. Augmented Multimodality Fusion for Generalized Zero-Shot Sketch-Based Visual Retrieval. IEEE Transactions on Image Processing. 2022;31:3657–3668. Available from: https://doi.org/10.1109/TIP.2022.3173815
Sun H, Xu J, Wang J, Qi Q, Ge C, Liao J. DLI-Net: Dual Local Interaction Network for Fine-Grained Sketch-Based Image Retrieval. IEEE Transactions on Circuits and Systems for Video Technology. 2022;32(10):7177–7189. Available from: https://doi.org/10.1109/TCSVT.2022.3171972
Wang L, Qian X, Zhang X, Hou X. Sketch-Based Image Retrieval With Multi-Clustering Re-Ranking. IEEE Transactions on Circuits and Systems for Video Technology. 2020;30(12):4929–4943. Available from: https://doi.org/10.1109/TCSVT.2019.2959875
Liang S, Dai W, Wei Y. Uncertainty Learning for Noise Resistant Sketch-Based 3D Shape Retrieval. IEEE Transactions on Image Processing. 2021;30:8632–8643. Available from: https://doi.org/10.1109/TIP.2021.3118979
Wang X, Girshick R, Gupta A, He K. Non-local Neural Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018;p. 7794–7803. Available from: https://doi.org/10.1109/CVPR.2018.00813
Collomosse JP, Mcneill G, Qian Y. Storyboard sketches for Content Based Video Retrieval. 2009 IEEE 12th International Conference on Computer Vision. 2009;p. 245–252. Available from: https://doi.org/10.1109/ICCV.2009.5459258
Sangkloy P, Burnell N, Ham C, Hays J. The sketchy database: learning to retrieve badly drawn bunnies. ACM TOG. 2016. Available from: https://doi.org/10.1145/2897824.2925954
Hara K, Kataoka H, Satoh Y. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018;p. 6546–6555. Available from: https://doi.org/10.1109/CVPR.2018.00685
Liu K, Liu W, Ma H, Tan M, Gan C. A Real-Time Action Representation With Temporal Encoding and Deep Compression. IEEE Transactions on Circuits and Systems for Video Technology. 2021;31(2):647–660. Available from: https://doi.org/10.1109/TCSVT.2020.2984569

Copyright

© 2023 Pavithra & Kumar. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)