A Comprehensive Survey of Text Encoders for Text-to-Image Diffusion Models

Shun Fang

airo 24(1):

Research Article

A Comprehensive Survey of Text Encoders for Text-to-Image Diffusion Models

Download355 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/airo.5566,
    author={Shun Fang},
    title={A Comprehensive Survey of Text Encoders for Text-to-Image Diffusion Models},
    journal={EAI Endorsed Transactions on AI and Robotics},
    volume={3},
    number={1},
    publisher={EAI},
    journal_a={AIRO},
    year={2024},
    month={12},
    keywords={NLP, CLIP, T5-XXL, BERT, Text Encoder},
    doi={10.4108/airo.5566}
}

Shun Fang
Year: 2024
A Comprehensive Survey of Text Encoders for Text-to-Image Diffusion Models
AIRO
EAI
DOI: 10.4108/airo.5566

Shun Fang¹^,*

1: Peking University

*Contact email: fangshun@pku.org.cn

Abstract

In this comprehensive survey, we delve into the realm of text encoders for text-to-image diffusion models, focusing on the principles, challenges, and opportunities associated with these encoders. We explore the state-of-the-art models, including BERT, T5-XXL, and CLIP, that have revolutionized the way we approach language understanding and cross-modal interactions. These models, with their unique architectures and training techniques, enable remarkable capabilities in generating images from textual descriptions. However, they also face limitations and challenges, such as computational complexity and data scarcity. We discuss these issues and highlight potential opportunities for further research. By providing a comprehensive overview, this survey aims to contribute to the ongoing development of text-to-image diffusion models, enabling more accurate and efficient image generation from textual inputs.

Keywords: NLP, CLIP, T5-XXL, BERT, Text Encoder

Received: 2024-12-04
Accepted: 2024-12-04
Published: 2024-12-04
Publisher: EAI

: http://dx.doi.org/10.4108/airo.5566

Copyright © 2024 Fang et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

A Comprehensive Survey of Text Encoders for Text-to-Image Diffusion Models

Abstract

About EAI

Community

Publish with EAI