Research Article
Semantic Image Synthesis from Text: Current Trends and Future Horizons in Text-to-Image Generation
@ARTICLE{10.4108/eetiot.5336, author={Lakshmanan Sudha and Kari Balakrishnan Aruna and Vijayakumar Sureka and Mathavan Niveditha and S Prema}, title={Semantic Image Synthesis from Text: Current Trends and Future Horizons in Text-to-Image Generation}, journal={EAI Endorsed Transactions on Internet of Things}, volume={11}, number={1}, publisher={EAI}, journal_a={IOT}, year={2024}, month={12}, keywords={Text-to-Image Generation, Generative Adversarial Networks (GANs), Multimodal Models, Natural Language Processing, Computer Vision, Ethical AI, Interpretability}, doi={10.4108/eetiot.5336} }
- Lakshmanan Sudha
Kari Balakrishnan Aruna
Vijayakumar Sureka
Mathavan Niveditha
S Prema
Year: 2024
Semantic Image Synthesis from Text: Current Trends and Future Horizons in Text-to-Image Generation
IOT
EAI
DOI: 10.4108/eetiot.5336
Abstract
Text-to-image generation, a captivating intersection of natural language processing and computer vision, has undergone a remarkable evolution in recent years. This research paper provides a comprehensive review of the state-of-the-art in text-to-image generation techniques, highlighting key advancements and emerging trends. We begin by surveying the foundational models, with a focus on Generative Adversarial Networks (GANs) and their pivotal role in generating realistic and diverse images from textual descriptions. We delve into the intricacies of training data, model architectures, and evaluation metrics, offering insights into the challenges and opportunities in this field. Furthermore, this paper explores the synergistic relationship between natural language processing and computer vision, showcasing multimodal models like DALL-E and CLIP. These models not only generate images from text but also understand the contextual relationships between textual descriptions and images, opening avenues for content recommendation, search engines, and visual storytelling. The paper discusses applications spanning art, design, e-commerce, healthcare, and education, where text-to-image generation has made significant inroads. We highlight the potential of this technology in automating content creation, aiding in diagnostics, and transforming the fashion and e-commerce industries. However, the journey of text-to-image generation is not without its challenges. We address ethical considerations, emphasizing responsible AI and the mitigation of biases in generated content. We also explore interpretability and model transparency, critical for ensuring trust and accountability.
Copyright © 2024 L. Sudha et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.