
Research Article
Effective Object detection and Tracking using Attention-Driven YOLO v9 Model with Multi-Stage Cascaded Convolutional Model
@ARTICLE{10.4108/eetiot.8231, author={Krishna Mohan A and P. V. N. Reddy and K. Satyma Prasad}, title={Effective Object detection and Tracking using Attention-Driven YOLO v9 Model with Multi-Stage Cascaded Convolutional Model}, journal={EAI Endorsed Transactions on Internet of Things}, volume={11}, number={1}, publisher={EAI}, journal_a={IOT}, year={2025}, month={6}, keywords={Object Detection, Tracking, Vehicle, CNN, YOLO v9, Deep learning, Attention mechanism, Spatial mechanism}, doi={10.4108/eetiot.8231} }
- Krishna Mohan A
P. V. N. Reddy
K. Satyma Prasad
Year: 2025
Effective Object detection and Tracking using Attention-Driven YOLO v9 Model with Multi-Stage Cascaded Convolutional Model
IOT
EAI
DOI: 10.4108/eetiot.8231
Abstract
INTRODUCTION: Object detection and tracking are essential for computer vision, particularly for vehicle monitoring within digital images and video streams. Traditional methods, such as background subtraction and template matching, rely on heuristic algorithm and handcrafted features, which often struggles with diverse vehicle appearance and complex backgrounds. These techniques, while foundational, exhibit limitations in flexibility and scalability, resulting in lower accuracy and high computational costs. OBJECTIVES: In contrast, advanced Deep Learning (DL) approaches, particularly those utilizing Conventional Neural Network (CNNs), have revolutionized the field by enabling automatic feature extraction from large datasets. Despite their advantages, existing DL models like You Only Look Once (YOLO) face challenges in detecting small or closely packed vehicles and can be computationally intensive. METHODS: This study proposed an Attention Driven YOLO v9 architecture that integrates with a proposed mechanism combining spatial and channel attention to detect the small size vehicle accurately. RESULTS: Additionally the architecture incorporates multi stage cascaded convolution layers to enhance the feature extraction and robustness against occlusion and background noise. The model is trained using the UA-DETRAC dataset, providing a rich set of images for learning. CONCLUSION: Performance evaluation metric such as Mean Average Precision (mAP), precision, recall, and tracking accuracy demonstrating significant improvement over traditional methods and existing state of the art models. This research contributes to the field by addressing the limitations of previous studies through technique to speed and accuracy in vehicle detection and tracking.
Copyright © 2025 Krishna Mohan A et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NCSA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.