Research Article
Transformer-Guided Video Inpainting Algorithm Based on Local Spatial-Temporal joint
@ARTICLE{10.4108/eetel.3156, author={Jing Wang and ZongJu Yang}, title={Transformer-Guided Video Inpainting Algorithm Based on Local Spatial-Temporal joint}, journal={EAI Endorsed Transactions on e-Learning}, volume={8}, number={4}, publisher={EAI}, journal_a={EL}, year={2024}, month={12}, keywords={video inpainting algorithm, flow-guided, attention mechanism, spatial-temporal transformer, Deep Flow Network, video target removal}, doi={10.4108/eetel.3156} }
- Jing Wang
ZongJu Yang
Year: 2024
Transformer-Guided Video Inpainting Algorithm Based on Local Spatial-Temporal joint
EL
EAI
DOI: 10.4108/eetel.3156
Abstract
INTRODUCTION: Video inpainting is a very important task in computer vision, and it’s a key component of various practical applications. It also plays an important role in video occlusion removal, traffic monitoring and old movie restoration technology. Video inpainting is to obtain reasonable content from the video sequence to fill the missing region, and maintain time continuity and spatial consistency. OBJECTIVES: In previous studies, due to the complexity of the scene of video inpainting, there are often cases of fast motion of objects in the video or motion of background objects, which will lead to optical flow failure. So the current video inpainting algorithm hasn’t met the requirements of practical applications. In order to avoid the problem of optical flow failure, this paper proposes a transformer-guided video inpainting model based on local Spatial-temporal joint. METHODS: First, considering the rich Spatial-temporal relationship between local flows, a Local Spatial-Temporal Joint Network (LSTN) including encoder, decoder and transformer module is designed to roughly inpaint the local corrupted frames, and the Deep Flow Network is used to calculate the local bidirectional corrupted flows. Then, the local corrupted optical flow map is input into the Local Flow Completion Network (LFCN) with pseudo 3D convolution and attention mechanism to obtain a complete set of bidirectional local optical flow maps. Finally, the roughly inpainted local frame and the complete bidirectional local optical flow map are sent to the Spatial-temporal transformer and the inpainted video frame is output. RESULTS: Experiments show that the algorithm achieves high quality results in the video target removal task, and has a certain improvement in indicators compared with advanced technologies. CONCLUSION: Transformer-Guided Video Inpainting Algorithm Based on Local Spatial-Temporal joint can obtain high-quality optical flow information and inpainted result video.
Copyright © 2023 J. Wang et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.