• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2024, Volume: 17, Issue: 21, Pages: 2177-2198

Original Article

Multi-dimensional CNN Based Feature Extraction with Feature Fusion and SVM for Human Activity Recognition in Surveillance Videos

Received Date:21 December 2023, Accepted Date:24 April 2024, Published Date:21 May 2024


Background/Objectives: The accurate recognition of human activities from video sequences is very challenging due to low resolution, cluttered background, partial occlusion, and different viewpoints. Machine learning (ML) based automated HAR from surveillance videos is required with the fusion of various feature extraction techniques. Methods: In this paper, SVM with feature fusion is utilized for automatic recognition from surveillance videos. A Histogram of Oriented Gradient (HOG) is used to segment the frame to differentiate humans from other objects or background noise in the input video frames. The multi-feature extraction can be accomplished in terms of Gabor Wavelet Transform (GWT), Autocorrelogram, Gray-Level Co-Occurrence Matrix (GLCM), HSV histogram, and Multi-dimensional CNN. The proposed approach is implemented in MATLAB software and compared with existing approaches like Space-Time Interest Point (STIP) and Histogram of Optical Flow (HOF). Findings: The proposed approach outperforms the existing approaches in terms of reduced time consumption and high accuracy, 99.886% when using the UCF101 dataset and 99.538% when using the UTKinect dataset. Novelty: The most discriminative feature information is obtained with the feature-level fusion technique. From the feature information, various human actions are recognized with the classification algorithm.

Keywords: Human activity recognition, Machine Learning, Surveillance Videos, Human detection algorithm, Feature extraction, SVM classifier


  1. Singh T, Vishwakarma DK. A deeply coupled ConvNet for human activity recognition using dynamic and RGB images. Neural Computing and Applications. 2021;33(1):469–485. Available from: https://dx.doi.org/10.1007/s00521-020-05018-y
  2. Snoun A, Jlidi N, Bouchrika T, Jemai O, Zaied M. Towards a deep human activity recognition approach based on video to image transformation with skeleton data. Multimedia Tools and Applications. 2021;80(19):29675–29698. Available from: https://dx.doi.org/10.1007/s11042-021-11188-1
  3. Anuradha SG, Teja KD. Deep Learning based Human Activity Recognition System with Open Datasets. Turkish Journal of Computer and Mathematics Education (TURCOMAT). 2021;12(13):3143–3147. Available from: https://doi.org/10.17762/turcomat.v12i13.9093
  4. Ullah A, Muhammad K, Ding W, Palade V, Haq IU, Baik SW. Efficient activity recognition using lightweight CNN and DS-GRU network for surveillance applications. Applied Soft Computing. 2021;103:1–13. Available from: https://dx.doi.org/10.1016/j.asoc.2021.107102
  5. Elharrouss O, Almaadeed N, Al-Maadeed S, Bouridane A, Beghdadi A. A combined multiple action recognition and summarization for surveillance video sequences. Applied Intelligence. 2021;51(2):690–712. Available from: https://dx.doi.org/10.1007/s10489-020-01823-z
  6. Franco A, Magnani A, Maio D. A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recognition Letters. 2020;131:293–299. Available from: https://dx.doi.org/10.1016/j.patrec.2020.01.010
  7. Zhang Y, Po LM, Liu M, Rehman YAU, Ou W, Zhao Y. Data-level information enhancement: Motion-patch-based Siamese Convolutional Neural Networks for human activity recognition in videos. Expert Systems with Applications. 2020;147. Available from: https://dx.doi.org/10.1016/j.eswa.2020.113203
  8. Wan S, Qi L, Xu X, Tong C, Gu Z. Deep Learning Models for Real-time Human Activity Recognition with Smartphones. Mobile Networks and Applications. 2020;25(2):743–755. Available from: https://dx.doi.org/10.1007/s11036-019-01445-x
  9. Singh R, Sonawane A, Srivastava R. Recent evolution of modern datasets for human activity recognition: a deep survey. Multimedia Systems. 2020;26(2):83–106. Available from: https://dx.doi.org/10.1007/s00530-019-00635-7
  10. Shreyas DG, Raksha S, Prasad BG. Implementation of an Anomalous Human Activity Recognition System. SN Computer Science. 2020;1(3). Available from: https://dx.doi.org/10.1007/s42979-020-00169-0
  11. Dwivedi N, Singh DK, Kushwaha DS. Orientation Invariant Skeleton Feature (OISF): a new feature for Human Activity Recognition. Multimedia Tools and Applications. 2020;79(29-30):21037–21072. Available from: https://dx.doi.org/10.1007/s11042-020-08902-w
  12. Mukherjee S, Anvitha L, Lahari TM. Human activity recognition in RGB-D videos by dynamic images. Multimedia Tools and Applications. 2020;79(27-28):19787–19801. Available from: https://dx.doi.org/10.1007/s11042-020-08747-3
  13. Kwon H, Tong C, Haresamudram H, Gao Y, Abowd GD, Lane ND, et al. IMUTube: Automatic Extraction of Virtual on-body Accelerometry from Video for Human Activity Recognition. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. (Vol. 4, Issue 3, pp. 1-29) Association for Computing Machinery (ACM). 2020. 10.1145/3411841
  14. Gul MA, Yousaf MH, Nawaz S, Rehman ZU, Kim H. Patient Monitoring by Abnormal Human Activity Recognition Based on CNN Architecture. Electronics. 2020;9(12):1–14. Available from: https://doi.org/10.3390/electronics9121993
  15. Ghazal S, Khan US, Saleem MM, Rashid N, Iqbal J. Human activity recognition using 2D skeleton data and supervised machine learning. IET Image Processing. 2019;13(13):2572–2578. Available from: https://dx.doi.org/10.1049/iet-ipr.2019.0030
  16. Naveed H, Khan G, Khan AU, Siddiqi A, Khan MUG. Human activity recognition using mixture of heterogeneous features and sequential minimal optimization. International Journal of Machine Learning and Cybernetics. 2019;10(9):2329–2340. Available from: https://dx.doi.org/10.1007/s13042-018-0870-1
  17. Zhou X, Liang W, Wang KIK, Wang H, Yang LT, Jin Q. Deep-Learning-Enhanced Human Activity Recognition for Internet of Healthcare Things. IEEE Internet of Things Journal. 2020;7(7):6429–6438. Available from: https://dx.doi.org/10.1109/jiot.2020.2985082
  18. Ehatisham-Ul-Haq M, Javed A, Azam MA, Malik HMA, Irtaza A, Lee IH, et al. Robust Human Activity Recognition Using Multimodal Feature-Level Fusion. IEEE Access. 2019;7:60736–60751. Available from: https://dx.doi.org/10.1109/access.2019.2913393
  19. Kushwaha A, Khare A, Srivastava P. On integration of multiple features for human activity recognition in video sequences. Multimedia Tools and Applications. 2021;80(21-23):32511–32538. Available from: https://dx.doi.org/10.1007/s11042-021-11207-1
  20. Deotale D, Verma M, Suresh P. Human Activity Recognition in Untrimmed Video using Deep Learning for Sports Domain. SSRN Electronic Journal. ;p. 1–12. Available from: https://dx.doi.org/10.2139/ssrn.3769815
  21. Girdhar P, Johri P, Virmani D. Incept_LSTM : Accession for human activity concession in automatic surveillance. Journal of Discrete Mathematical Sciences and Cryptography. 2022;25(8):2259–2273. Available from: https://doi.org/10.1080/09720529.2020.1804132
  22. Alawneh L, Alsarhan T, Al-Zinati M, Al-Ayyoub M, Jararweh Y, Lu H. Enhancing human activity recognition using deep learning and time series augmented data. Journal of Ambient Intelligence and Humanized Computing. 2021;12(12):10565–10580. Available from: https://dx.doi.org/10.1007/s12652-020-02865-4
  23. Santos F, Durães D, Marcondes F, Gomes M, Gonçalves F, Fonseca J, et al. Modelling a Deep Learning Framework for Recognition of Human Actions on Video. In: World Conference on Information Systems and Technologies, Advances in Intelligent Systems and Computing. (Vol. 1365, pp. 104-112) Springer, Cham. 2021.
  24. Marcondes FS, Durães D, Gonçalves F, Fonseca J, Machado J, Novais P. In-vehicle violence detection in carpooling: a brief survey towards a general surveillance system. InDistributed Computing and Artificial Intelligence. In: 17th International Symposium on Distributed Computing and Artificial Intelligence, Advances in Intelligent Systems and Computing . (Vol. 1237, pp. 211-220) Springer, Cham. 2020.
  25. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. (pp. 248-255) IEEE. 2009.
  26. Vishwakarma DK, Rawat P, Kapoor R. Human Activity Recognition Using Gabor Wavelet Transform and Ridgelet Transform. Procedia Computer Science. 2015;57:630–636. Available from: https://dx.doi.org/10.1016/j.procs.2015.07.425
  27. Haralick RM, Shanmugam K, Dinstein I. Textural Features for Image Classification. IEEE Transactions on Systems, Man, and Cybernetics. 1973;SMC-3(6):610–621. Available from: https://dx.doi.org/10.1109/tsmc.1973.4309314
  28. Kumari B, Kumar R, Singh VK, Pawar L, Pandey P, Sharma M. An Efficient System for Color Image Retrieval Representing Semantic Information to Enhance Performance by Optimizing Feature Extraction. Procedia Computer Science. 2019;152:102–110. Available from: https://dx.doi.org/10.1016/j.procs.2019.05.032
  29. Zhang H, Li Y, Jiang Y, Wang P, Shen Q, Shen C. Hyperspectral Classification Based on Lightweight 3-D-CNN With Transfer Learning. IEEE Transactions on Geoscience and Remote Sensing. 2019;57(8):5813–5828. Available from: https://dx.doi.org/10.1109/tgrs.2019.2902568
  30. Chui KT, Gupta BB, Chi HR, Arya V, Alhalabi W, Ruiz MT, et al. Transfer Learning-Based Multi-Scale Denoising Convolutional Neural Network for Prostate Cancer Detection. Cancers. 2022;14(15):1–13. Available from: https://dx.doi.org/10.3390/cancers14153687
  31. Hakim M, Omran AAB, Inayat-Hussain JI, Ahmed AN, Abdellatef H, Abdellatif A, et al. Bearing Fault Diagnosis Using Lightweight and Robust One-Dimensional Convolution Neural Network in the Frequency Domain. Sensors. 2022;22(15):1–24. Available from: https://dx.doi.org/10.3390/s22155793
  32. Jaouedi N, Boujnah N, Bouhlel MS. A new hybrid deep learning model for human action recognition. Journal of King Saud University - Computer and Information Sciences. 2020;32(4):447–453. Available from: https://dx.doi.org/10.1016/j.jksuci.2019.09.004
  33. Dong M, Fang Z, Li Y, Bi S, Chen J. AR3D: Attention Residual 3D Network for Human Action Recognition. Sensors. 2021;21(5):1–14. Available from: https://dx.doi.org/10.3390/s21051656
  34. Li W, Xu N, Liu G, Zhao L, Fang X. Segments-Based 3D ConvNet for Action Recognition. In: 2020 International Conference on Computer Science and Communication Technology (ICCSCT) 2020 , Journal of Physics: Conference Series. (Vol. 1621, pp. 1-7) IOP Publishing. 2020.
  35. Richard A, Gall J. A bag-of-words equivalent recurrent neural network for action recognition. Computer Vision and Image Understanding. 2017;156:79–91. Available from: https://dx.doi.org/10.1016/j.cviu.2016.10.014
  36. Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW. Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features. IEEE Access. 2017;6:1155–1166. Available from: https://dx.doi.org/10.1109/access.2017.2778011
  37. Liu G, Zhang Q, Cao Y, Tian G, Ji Z. Online human action recognition with spatial and temporal skeleton features using a distributed camera network. International Journal of Intelligent Systems. 2021;36(12):7389–7411. Available from: https://dx.doi.org/10.1002/int.22591
  38. Kao JY, Ortega A, Tian D, Mansour H, Vetro A. Graph Based Skeleton Modeling for Human Activity Analysis. In: 2019 IEEE International Conference on Image Processing (ICIP). (pp. 2025-2029) IEEE. 2019.
  39. Zhu K, Wang R, Zhao Q, Cheng J, Tao D. A Cuboid CNN Model With an Attention Mechanism for Skeleton-Based Action Recognition. IEEE Transactions on Multimedia. 2020;22(11):2977–2989. Available from: https://dx.doi.org/10.1109/tmm.2019.2962304
  40. Mohammadzade H, Hosseini S, Rezaei-Dastjerdehei MR, Tabejamaat M. Dynamic Time Warping-Based Features With Class-Specific Joint Importance Maps for Action Recognition Using Kinect Depth Sensor. IEEE Sensors Journal. 2021;21(7):9300–9313. Available from: https://dx.doi.org/10.1109/jsen.2021.3051497
  41. Ghodsi S, Mohammadzade H, Korki E. Simultaneous joint and object trajectory templates for human activity recognition from 3-D data. Journal of Visual Communication and Image Representation. 2018;55:729–741. Available from: https://dx.doi.org/10.1016/j.jvcir.2018.08.001


© 2024 Shah & Holia. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)


Subscribe now for latest articles and news.