
Research Article
ViT-YOLO: A Hybrid Transformer-CNN Approach for Real-Time Fire and Garbage Detection in Urban Surveillance
@INPROCEEDINGS{10.4108/eai.28-4-2025.2357941, author={K Leena Priya and G Selva Kumaran and Logie S and Poorani C and R Mari Selvan}, title={ViT-YOLO: A Hybrid Transformer-CNN Approach for Real-Time Fire and Garbage Detection in Urban Surveillance}, proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I}, publisher={EAI}, proceedings_a={ICITSM PART I}, year={2025}, month={10}, keywords={index terms secure messaging end-to-end encryption cryptographic key exchange data protection privacy secret communication encryption algorithms}, doi={10.4108/eai.28-4-2025.2357941} }
- K Leena Priya
G Selva Kumaran
Logie S
Poorani C
R Mari Selvan
Year: 2025
ViT-YOLO: A Hybrid Transformer-CNN Approach for Real-Time Fire and Garbage Detection in Urban Surveillance
ICITSM PART I
EAI
DOI: 10.4108/eai.28-4-2025.2357941
Abstract
Contemporary security systems are advancing focusing on public safety and environmental cleanliness. Real-time fire and garbage Detection system using AI from CCTV or uploaded video: This project describes real-time AI enabled surveillance systems that can detect fire and garbage from CCTV streams or uploaded video files. With the help of a deep learning algorithm based on a Convolutional Neural Network (CNN)-EfficientNet, and YOLOV8 the system categorizes the video frames on three categories: “No Fire/ No Garbage,” “Fire Detected,” and “Garbage Detected”. The system analyzes video frames through resizing along with other technical processing (e.g., color correction) for the accurate detection of fire as well as garbage and when the fire or garbage is detected the system sends an E-mail to the authorities for immediate handling. With both TensorFlow and Keras baked in, it also has a friendly interface with Streamlit, for live webcam analysis and uploading video. After being trained on a large dataset to get accurate results, the model can detect fire and trash anywhere and everywhere. It's a real-time system and can be used to estimate dense crowd density on-a-fly, which can be used for mass surveillance in smart cities, industrial parks, or public places.