
Research Article
Learning Human Actions: A Walk Through
@INPROCEEDINGS{10.4108/eai.28-4-2025.2357848, author={K. Jagrutha Aditya and K. Narendra and D. Karthik Reddy and T. Anil and Eva Patel}, title={Learning Human Actions: A Walk Through}, proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I}, publisher={EAI}, proceedings_a={ICITSM PART I}, year={2025}, month={10}, keywords={human activity recognition 3d convolutional neural network resnet spatiotemporal features inflated 3d convnet video vision transformer}, doi={10.4108/eai.28-4-2025.2357848} }
- K. Jagrutha Aditya
K. Narendra
D. Karthik Reddy
T. Anil
Eva Patel
Year: 2025
Learning Human Actions: A Walk Through
ICITSM PART I
EAI
DOI: 10.4108/eai.28-4-2025.2357848
Abstract
Human action recognition in video is essential for numerous intelligent systems, ranging from surveillance to medical applications. In this contribution, we describe a comparative evaluation of four Deep Neural Networks (DNN) architectures intended to efficiently learn and recognize human actions: 3D Convolutional Neural Networks (3D CNN), 3D CNN with ResNet backbone, Inflated 3D ConvNet (I3D), and the Video Vision Transformer (ViViT). These architectures are compared in terms of how well they are capable of learning rich spatial-temporal representations needed to understand dynamic human activities. Through the examination of the performance and shortcomings of every design, this research offers insights into the developing scenario of video-based HAR and indicates the advantages of transformer-based attention mechanism over conventional convolutional methodologies.