A Multi-Model Video Summarization Framework Integrating Feature Extraction, Embedding and Transformer-Based Learning

Sahaya Sakila V; Chitransh Nishad; Muthangi Shashank; Tarun Prithi Gopinath

Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I

Research Article

A Multi-Model Video Summarization Framework Integrating Feature Extraction, Embedding and Transformer-Based Learning

Download505 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/eai.28-4-2025.2357767,
    author={Sahaya Sakila  V and Chitransh  Nishad and Muthangi  Shashank and Tarun Prithi  Gopinath},
    title={A Multi-Model Video Summarization Framework Integrating Feature Extraction, Embedding and Transformer-Based Learning},
    proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I},
    publisher={EAI},
    proceedings_a={ICITSM PART I},
    year={2025},
    month={10},
    keywords={video summarization deep learning openai whisper faiss pytorch resnet50 semantic embedding benchmarking},
    doi={10.4108/eai.28-4-2025.2357767}
}

Sahaya Sakila V
Chitransh Nishad
Muthangi Shashank
Tarun Prithi Gopinath
Year: 2025
A Multi-Model Video Summarization Framework Integrating Feature Extraction, Embedding and Transformer-Based Learning
ICITSM PART I
EAI
DOI: 10.4108/eai.28-4-2025.2357767

Sahaya Sakila V¹^,*, Chitransh Nishad¹, Muthangi Shashank¹, Tarun Prithi Gopinath¹

1: SRM Institute of Science and Technology

*Contact email: sahayasv2@srmist.edu.in

Abstract

Video summarization is important for managing large volumes of videos across various domains such as media, education, surveillance and so on. Traditional approach for summarization includes keyframe selection and clustering which fails to capture the temporal dependencies and semantic context that leads to incomplete or redundant summaries. To address these limitations faced by existing systems, the proposed method: VidSynape introduces a multi model video summarization that combines frame level analysis with insights from transcript. It uses deep feature embeddings for visual content representation and efficient similarity-based indexing to enhance scalability and speed. Using multi model techniques this approach improves summary, contextual coherence and computation efficiency. The system is tested on datasets such as Sum Me and TVSum with results showing better performance over other methods. It effectively generates quality summaries while reducing processing time making it a solution for real world video analysis application.

Keywords: video summarization, deep learning, openai whisper, faiss, pytorch, resnet50, semantic embedding, benchmarking

Published: 2025-10-13
Publisher: EAI

: http://dx.doi.org/10.4108/eai.28-4-2025.2357767

A Multi-Model Video Summarization Framework Integrating Feature Extraction, Embedding and Transformer-Based Learning

Abstract

About EAI

Community

Publish with EAI