• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: 7, Pages: 476-484

Original Article

A Computational Meta-Learning Inspired Model for Sketch-based Video Retrieval

Received Date:02 November 2022, Accepted Date:25 January 2023, Published Date:20 February 2023


Objectives: To design and develop an efficient computing framework for sketch-based video retrieval using fine-grained intrinsic computational approach. Methods: The primary method of sketch-based video retrieval adopts multi-stream multi-modality of joint embedding method for improved P-SBVR from improved fine-grained KTH and TSF related dataset. It considers the potential aspects of the computation of significant visual intrinsic appearance details for sketch objects. The extracted appearance and motion-based features are used to train three different CNN baselines under strong and weak supervision. The system also implements a meta-learning model for different supervised settings to attain better performance of sketch-based video retrieval along with a relational module to overcome the problem of overfitting. Findings: The study derives specific sketch sequences from its formulated dataset to compute instance-level query processing for video retrieval. Further, it also addresses the limitations arising in the context of coarse-grained video retrieval models and sketch-based still image retrieval. The aggregated dataset for rich annotation assisted in the experimental simulation. The experimental evaluation with respect to the performance metric evaluates the 3D CNN baselines under strong supervision and weak-supervision where CNN BL-Type-2 attains maximum video retrieval accuracy of 99.96% for triplet grading feature under relational schema. CNN BL-Type-1 attains maximum retrieval accuracy of 97.40% considering the triplet grading features from the improved SBVR. The evaluation metric for the instance level retrieval process also considers true matching of sketches with the videos, it clearly shows that the appropriate appearance and motion based feature selection has enhanced the video retrieval accuracy up to 96.90% with 99.28% accuracy in action identification considering motion stream, 98.17% for appearance module and 98.45% for fusion module. Another important aspect of the proposed research context is that it addresses the problem of cross-modality while executing the simultaneous matching paradigm for visual appearances of the object with its movement appearing on particular video scenes. The experimental outcome showsits comparable effectiveness relative to the existing system of CNN. Novelty: Unlike the conventional system of sketch analysis, which is more focused on static objects or scenes, the presented approach can efficiently compute the important visual intrinsic appearance details of the object of interest from the sketch and then activate the operations for video retrieval. The proposed CNN based learning model with improved P-SBVR dataset attains better computing time for retrieval with are approximately (200, 210 and 214) milliseconds for CNN BL-Type-1, CNN BL-Type-2, CNN BL-Type-3 and comparable with the existing deep learning based SBVR models.

Keywords: Sketch Based Video Retrieval; Intrinsic Appearance Details; Meta Learning; Sketch Dataset; Cross Modality Problem


  1. Araujo A, Girod B. Large-Scale Video Retrieval Using Image Queries. IEEE Transactions on Circuits and Systems for Video Technology. 2018;28(6):1406–1420. Available from: https://doi.org/10.1109/TCSVT.2017.2667710
  2. Sheng B, Li P, Gao C, Ma KL. Deep Neural Representation Guided Face Sketch Synthesis. IEEE Transactions on Visualization and Computer Graphics. 2019;25(12):3216–3230. Available from: https://doi.org/10.1109/TVCG.2018.2866090
  3. Xu P, Huang Y, Yuan T, Pang K, Song YZ, Xiang T, et al. SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018;p. 8090–8098. Available from: https://doi.org/10.1109/CVPR.2018.00844
  4. Xu P, Yin Q, Huang Y, Song YZ, Ma Z, Wang L, et al. Cross-modal subspace learning for fine-grained sketch-based image retrieval. Neurocomputing. 2018. Available from: https://doi.org/10.48550/arXiv.1705.09888
  5. Muhammad UR, Yang Y, Song YZ, Xiang T, Hospedales TM. Learning Deep Sketch Abstraction. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018;p. 8014–8023. Available from: https://doi.org/10.1109/CVPR.2018.00836
  6. Jing T, Xia H, Hamm J, Ding Z. Augmented Multimodality Fusion for Generalized Zero-Shot Sketch-Based Visual Retrieval. IEEE Transactions on Image Processing. 2022;31:3657–3668. Available from: https://doi.org/10.1109/TIP.2022.3173815
  7. Sun H, Xu J, Wang J, Qi Q, Ge C, Liao J. DLI-Net: Dual Local Interaction Network for Fine-Grained Sketch-Based Image Retrieval. IEEE Transactions on Circuits and Systems for Video Technology. 2022;32(10):7177–7189. Available from: https://doi.org/10.1109/TCSVT.2022.3171972
  8. Wang L, Qian X, Zhang X, Hou X. Sketch-Based Image Retrieval With Multi-Clustering Re-Ranking. IEEE Transactions on Circuits and Systems for Video Technology. 2020;30(12):4929–4943. Available from: https://doi.org/10.1109/TCSVT.2019.2959875
  9. Liang S, Dai W, Wei Y. Uncertainty Learning for Noise Resistant Sketch-Based 3D Shape Retrieval. IEEE Transactions on Image Processing. 2021;30:8632–8643. Available from: https://doi.org/10.1109/TIP.2021.3118979
  10. Wang X, Girshick R, Gupta A, He K. Non-local Neural Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018;p. 7794–7803. Available from: https://doi.org/10.1109/CVPR.2018.00813
  11. Collomosse JP, Mcneill G, Qian Y. Storyboard sketches for Content Based Video Retrieval. 2009 IEEE 12th International Conference on Computer Vision. 2009;p. 245–252. Available from: https://doi.org/10.1109/ICCV.2009.5459258
  12. Sangkloy P, Burnell N, Ham C, Hays J. The sketchy database: learning to retrieve badly drawn bunnies. ACM TOG. 2016. Available from: https://doi.org/10.1145/2897824.2925954
  13. Hara K, Kataoka H, Satoh Y. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018;p. 6546–6555. Available from: https://doi.org/10.1109/CVPR.2018.00685
  14. Liu K, Liu W, Ma H, Tan M, Gan C. A Real-Time Action Representation With Temporal Encoding and Deep Compression. IEEE Transactions on Circuits and Systems for Video Technology. 2021;31(2):647–660. Available from: https://doi.org/10.1109/TCSVT.2020.2984569


© 2023 Pavithra & Kumar. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)


Subscribe now for latest articles and news.