A Survey of Data-Driven 2D Diffusion Models for Generating Images from Text

Shun Fang

airo 24(1):

Research Article

A Survey of Data-Driven 2D Diffusion Models for Generating Images from Text

Download177 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/airo.5453,
    author={Shun Fang},
    title={A Survey of Data-Driven 2D Diffusion Models for Generating Images from Text},
    journal={EAI Endorsed Transactions on AI and Robotics},
    volume={3},
    number={1},
    publisher={EAI},
    journal_a={AIRO},
    year={2024},
    month={4},
    keywords={2D Diffusion Model, DDPM, HighLDM, Imagen},
    doi={10.4108/airo.5453}
}

Shun Fang
Year: 2024
A Survey of Data-Driven 2D Diffusion Models for Generating Images from Text
AIRO
EAI
DOI: 10.4108/airo.5453

Shun Fang¹^,*

1: Peking University

*Contact email: fangshun@pku.org.cn

Abstract

This paper explores recent advances in generative modeling, focusing on DDPMs, HighLDM, and Imagen. DDPMs utilize denoising score matching and iterative refinement to reverse diffusion processes, enhancing likelihood estimation and lossless compression capabilities. HighLDM breaks new ground with high-res image synthesis by conditioning latent diffusion on efficient autoencoders, excelling in tasks through latent space denoising with cross-attention for adaptability to diverse conditions. Imagen combines transformer-based language models with HD diffusion for cutting-edge text-to-image generation. It uses pre-trained language encoders to generate highly realistic and semantically coherent images, surpassing competitors based on FID scores and human evaluations in DrawBench and similar benchmarks. The review critically examines each model's methods, contributions, performance, and limitations, providing a comprehensive comparison of their theoretical underpinnings and practical implications. The aim is to inform future generative modeling research across various applications.

Keywords: 2D Diffusion Model, DDPM, HighLDM, Imagen

Received: 2024-03-18
Accepted: 2024-04-21
Published: 2024-04-22
Publisher: EAI

: http://dx.doi.org/10.4108/airo.5453

Copyright © 2024 S. Fang et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

A Survey of Data-Driven 2D Diffusion Models for Generating Images from Text

Abstract

About EAI

Community

Publish with EAI