Learning Human Actions: A Walk Through

K. Jagrutha Aditya; K. Narendra; D. Karthik Reddy; T. Anil; Eva Patel

Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I

Research Article

Learning Human Actions: A Walk Through

Download10 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/eai.28-4-2025.2357848,
    author={K.  Jagrutha Aditya and K.  Narendra and D.  Karthik Reddy and T.  Anil and Eva  Patel},
    title={Learning Human Actions: A Walk Through},
    proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I},
    publisher={EAI},
    proceedings_a={ICITSM PART I},
    year={2025},
    month={10},
    keywords={human activity recognition 3d convolutional neural network resnet spatiotemporal features inflated 3d convnet video vision transformer},
    doi={10.4108/eai.28-4-2025.2357848}
}

K. Jagrutha Aditya
K. Narendra
D. Karthik Reddy
T. Anil
Eva Patel
Year: 2025
Learning Human Actions: A Walk Through
ICITSM PART I
EAI
DOI: 10.4108/eai.28-4-2025.2357848

K. Jagrutha Aditya¹^,*, K. Narendra¹, D. Karthik Reddy¹, T. Anil¹, Eva Patel¹

1: Vignan Foundation for Science Technology and Research, India

*Contact email: kja8586@gmail.com

Abstract

Human action recognition in video is essential for numerous intelligent systems, ranging from surveillance to medical applications. In this contribution, we describe a comparative evaluation of four Deep Neural Networks (DNN) architectures intended to efficiently learn and recognize human actions: 3D Convolutional Neural Networks (3D CNN), 3D CNN with ResNet backbone, Inflated 3D ConvNet (I3D), and the Video Vision Transformer (ViViT). These architectures are compared in terms of how well they are capable of learning rich spatial-temporal representations needed to understand dynamic human activities. Through the examination of the performance and shortcomings of every design, this research offers insights into the developing scenario of video-based HAR and indicates the advantages of transformer-based attention mechanism over conventional convolutional methodologies.

Keywords: human activity recognition, 3d convolutional neural network, resnet, spatiotemporal features, inflated 3d convnet, video vision transformer

Published: 2025-10-13
Publisher: EAI

: http://dx.doi.org/10.4108/eai.28-4-2025.2357848

Learning Human Actions: A Walk Through

Abstract

About EAI

Community

Publish with EAI