ct 18: e2

Research Article

A Survey of Audio Synthesis and Lip-syncing for Synthetic Video Generation

Download121 downloads
  • @ARTICLE{10.4108/eai.14-4-2021.169187,
        author={Anup Kadam and Sagar Rane and Arpit Kumar Mishra and Shailesh Kumar Sahu and Shubham Singh and Shivam Kumar Pathak},
        title={A Survey of Audio Synthesis and Lip-syncing for Synthetic Video Generation},
        journal={EAI Endorsed Transactions on Creative Technologies: Online First},
        keywords={Video Synthesis, Voice Cloning, Lip Synchronization, Video Generation Application},
  • Anup Kadam
    Sagar Rane
    Arpit Kumar Mishra
    Shailesh Kumar Sahu
    Shubham Singh
    Shivam Kumar Pathak
    Year: 2021
    A Survey of Audio Synthesis and Lip-syncing for Synthetic Video Generation
    DOI: 10.4108/eai.14-4-2021.169187
Anup Kadam1, Sagar Rane1,*, Arpit Kumar Mishra1, Shailesh Kumar Sahu1, Shubham Singh1, Shivam Kumar Pathak1
  • 1: Department of Computer Engineering, Army Institute of Technology, Pune, MH, India
*Contact email: sagarrane@aitpune.edu.in


The fields like Media, Education and Corporations etc have started focusing on content creation. This has led to the huge demand for synthetic media generation using less data. To synthesize a high-grade artificial video, the lip must be synchronized with the audio. Here we have compared the various methods for voice-cloning and lip synchronization. Voice cloning procedure include state of the art methods like wavenet and other text-to-speech approaches. Lip synchronization methods describe constrained and unconstrained methods. Various recent research like LipGan, Wav2Lip are discussed. The methods are compared and the best method is suggested. Apart from studying and comparing the various methods, their drawbacks, future scopes, and application are also there. Different social and ethical issues are also discussed.