
Research Article
Real Time Object Detection Using Fusion YOLO
@INPROCEEDINGS{10.4108/eai.28-4-2025.2357864, author={A Hemanth and T Hemandra and G Adi Narayana Reddy and P Leela Venkata Siva Sai and Sivadi Balakrishna}, title={Real Time Object Detection Using Fusion YOLO }, proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I}, publisher={EAI}, proceedings_a={ICITSM PART I}, year={2025}, month={10}, keywords={classification object detection yolo deep learning}, doi={10.4108/eai.28-4-2025.2357864} }
- A Hemanth
T Hemandra
G Adi Narayana Reddy
P Leela Venkata Siva Sai
Sivadi Balakrishna
Year: 2025
Real Time Object Detection Using Fusion YOLO
ICITSM PART I
EAI
DOI: 10.4108/eai.28-4-2025.2357864
Abstract
From autonomous navigation to environmental monitoring, object detection has become an essential part of many technological applications. Real-time detection has been transformed by YOLO (You Only Look Once) designs; nonetheless, they continue to encounter difficulties in specialized fields, complicated settings, and tiny objects. The innovative hybrid technique Fusion YOLO, which combines the effective detection framework of YOLO with Vision Transformer (ViT) feature extraction, is presented in this research. Using transformer-based features to enhance input representations, Fusion YOLO maintains computational economy while greatly increasing detection accuracy. Utilizing pre-trained ViT-tiny models, our approach extracts 192-dimensional feature vectors, which are further processed by a dedicated classification head. Experiments on the TACO waste detection dataset show significant gains in recall and precision over conventional methods. The model requires little extra computing cost and yields good classification accuracy. This method shows how transformer-based extraction of features may be used in conjunction with CNN-based detection techniques to overcome their inherent constraints and provides a scalable solution for domain-dependent object identification problems.