
Research Article
A Multi-Model Video Summarization Framework Integrating Feature Extraction, Embedding and Transformer-Based Learning
@INPROCEEDINGS{10.4108/eai.28-4-2025.2357767, author={Sahaya Sakila V and Chitransh Nishad and Muthangi Shashank and Tarun Prithi Gopinath}, title={A Multi-Model Video Summarization Framework Integrating Feature Extraction, Embedding and Transformer-Based Learning}, proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I}, publisher={EAI}, proceedings_a={ICITSM PART I}, year={2025}, month={10}, keywords={video summarization deep learning openai whisper faiss pytorch resnet50 semantic embedding benchmarking}, doi={10.4108/eai.28-4-2025.2357767} }
- Sahaya Sakila V
Chitransh Nishad
Muthangi Shashank
Tarun Prithi Gopinath
Year: 2025
A Multi-Model Video Summarization Framework Integrating Feature Extraction, Embedding and Transformer-Based Learning
ICITSM PART I
EAI
DOI: 10.4108/eai.28-4-2025.2357767
Abstract
Video summarization is important for managing large volumes of videos across various domains such as media, education, surveillance and so on. Traditional approach for summarization includes keyframe selection and clustering which fails to capture the temporal dependencies and semantic context that leads to incomplete or redundant summaries. To address these limitations faced by existing systems, the proposed method: VidSynape introduces a multi model video summarization that combines frame level analysis with insights from transcript. It uses deep feature embeddings for visual content representation and efficient similarity-based indexing to enhance scalability and speed. Using multi model techniques this approach improves summary, contextual coherence and computation efficiency. The system is tested on datasets such as Sum Me and TVSum with results showing better performance over other methods. It effectively generates quality summaries while reducing processing time making it a solution for real world video analysis application.