About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
el 22(4): e2

Research Article

Transformer-Guided Video Inpainting Algorithm Based on Local Spatial-Temporal joint

Download124 downloads
Cite
BibTeX Plain Text
  • @ARTICLE{10.4108/eetel.3156,
        author={Jing Wang and ZongJu Yang},
        title={Transformer-Guided Video Inpainting Algorithm Based on Local Spatial-Temporal joint},
        journal={EAI Endorsed Transactions on e-Learning},
        volume={8},
        number={4},
        publisher={EAI},
        journal_a={EL},
        year={2024},
        month={12},
        keywords={video inpainting algorithm, flow-guided, attention mechanism, spatial-temporal transformer, Deep Flow Network, video target removal},
        doi={10.4108/eetel.3156}
    }
    
  • Jing Wang
    ZongJu Yang
    Year: 2024
    Transformer-Guided Video Inpainting Algorithm Based on Local Spatial-Temporal joint
    EL
    EAI
    DOI: 10.4108/eetel.3156
Jing Wang1,*, ZongJu Yang1
  • 1: Henan Polytechnic University
*Contact email: wjasmine@hpu.edu.cn

Abstract

INTRODUCTION: Video inpainting is a very important task in computer vision, and it’s a key component of various practical applications. It also plays an important role in video occlusion removal, traffic monitoring and old movie restoration technology. Video inpainting is to obtain reasonable content from the video sequence to fill the missing region, and maintain time continuity and spatial consistency. OBJECTIVES: In previous studies, due to the complexity of the scene of video inpainting, there are often cases of fast motion of objects in the video or motion of background objects, which will lead to optical flow failure. So the current video inpainting algorithm hasn’t met the requirements of practical applications. In order to avoid the problem of optical flow failure, this paper proposes a transformer-guided video inpainting model based on local Spatial-temporal joint. METHODS: First, considering the rich Spatial-temporal relationship between local flows, a Local Spatial-Temporal Joint Network (LSTN) including encoder, decoder and transformer module is designed to roughly inpaint the local corrupted frames, and the Deep Flow Network is used to calculate the local bidirectional corrupted flows. Then, the local corrupted optical flow map is input into the Local Flow Completion Network (LFCN) with pseudo 3D convolution and attention mechanism to obtain a complete set of bidirectional local optical flow maps. Finally, the roughly inpainted local frame and the complete bidirectional local optical flow map are sent to the Spatial-temporal transformer and the inpainted video frame is output. RESULTS: Experiments show that the algorithm achieves high quality results in the video target removal task, and has a certain improvement in indicators compared with advanced technologies. CONCLUSION: Transformer-Guided Video Inpainting Algorithm Based on Local Spatial-Temporal joint can obtain high-quality optical flow information and inpainted result video.

Keywords
video inpainting algorithm, flow-guided, attention mechanism, spatial-temporal transformer, Deep Flow Network, video target removal
Received
2024-12-10
Accepted
2024-12-10
Published
2024-12-10
Publisher
EAI
http://dx.doi.org/10.4108/eetel.3156

Copyright © 2023 J. Wang et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL