About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I

Research Article

Learning Human Actions: A Walk Through

Download10 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.4108/eai.28-4-2025.2357848,
        author={K.  Jagrutha Aditya and K.  Narendra and D.  Karthik Reddy and T.  Anil and Eva  Patel},
        title={Learning Human Actions: A Walk Through},
        proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I},
        publisher={EAI},
        proceedings_a={ICITSM PART I},
        year={2025},
        month={10},
        keywords={human activity recognition 3d convolutional neural network resnet spatiotemporal features inflated 3d convnet video vision transformer},
        doi={10.4108/eai.28-4-2025.2357848}
    }
    
  • K. Jagrutha Aditya
    K. Narendra
    D. Karthik Reddy
    T. Anil
    Eva Patel
    Year: 2025
    Learning Human Actions: A Walk Through
    ICITSM PART I
    EAI
    DOI: 10.4108/eai.28-4-2025.2357848
K. Jagrutha Aditya1,*, K. Narendra1, D. Karthik Reddy1, T. Anil1, Eva Patel1
  • 1: Vignan Foundation for Science Technology and Research, India
*Contact email: kja8586@gmail.com

Abstract

Human action recognition in video is essential for numerous intelligent systems, ranging from surveillance to medical applications. In this contribution, we describe a comparative evaluation of four Deep Neural Networks (DNN) architectures intended to efficiently learn and recognize human actions: 3D Convolutional Neural Networks (3D CNN), 3D CNN with ResNet backbone, Inflated 3D ConvNet (I3D), and the Video Vision Transformer (ViViT). These architectures are compared in terms of how well they are capable of learning rich spatial-temporal representations needed to understand dynamic human activities. Through the examination of the performance and shortcomings of every design, this research offers insights into the developing scenario of video-based HAR and indicates the advantages of transformer-based attention mechanism over conventional convolutional methodologies.

Keywords
human activity recognition, 3d convolutional neural network, resnet, spatiotemporal features, inflated 3d convnet, video vision transformer
Published
2025-10-13
Publisher
EAI
http://dx.doi.org/10.4108/eai.28-4-2025.2357848
Copyright © 2025–2025 EAI
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL