Research Article
A Survey of Data-Driven 2D Diffusion Models for Generating Images from Text
@ARTICLE{10.4108/airo.5453, author={Shun Fang}, title={A Survey of Data-Driven 2D Diffusion Models for Generating Images from Text}, journal={EAI Endorsed Transactions on AI and Robotics}, volume={3}, number={1}, publisher={EAI}, journal_a={AIRO}, year={2024}, month={4}, keywords={2D Diffusion Model, DDPM, HighLDM, Imagen}, doi={10.4108/airo.5453} }
- Shun Fang
Year: 2024
A Survey of Data-Driven 2D Diffusion Models for Generating Images from Text
AIRO
EAI
DOI: 10.4108/airo.5453
Abstract
This paper explores recent advances in generative modeling, focusing on DDPMs, HighLDM, and Imagen. DDPMs utilize denoising score matching and iterative refinement to reverse diffusion processes, enhancing likelihood estimation and lossless compression capabilities. HighLDM breaks new ground with high-res image synthesis by conditioning latent diffusion on efficient autoencoders, excelling in tasks through latent space denoising with cross-attention for adaptability to diverse conditions. Imagen combines transformer-based language models with HD diffusion for cutting-edge text-to-image generation. It uses pre-trained language encoders to generate highly realistic and semantically coherent images, surpassing competitors based on FID scores and human evaluations in DrawBench and similar benchmarks. The review critically examines each model's methods, contributions, performance, and limitations, providing a comprehensive comparison of their theoretical underpinnings and practical implications. The aim is to inform future generative modeling research across various applications.
Copyright © 2024 S. Fang et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.