About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
IoT 24(1):

Research Article

Semantic Image Synthesis from Text: Current Trends and Future Horizons in Text-to-Image Generation

Download146 downloads
Cite
BibTeX Plain Text
  • @ARTICLE{10.4108/eetiot.5336,
        author={Lakshmanan Sudha and Kari Balakrishnan Aruna and Vijayakumar Sureka and Mathavan Niveditha and S Prema},
        title={Semantic Image Synthesis from Text: Current Trends and Future Horizons in Text-to-Image Generation},
        journal={EAI Endorsed Transactions on Internet of Things},
        volume={11},
        number={1},
        publisher={EAI},
        journal_a={IOT},
        year={2024},
        month={12},
        keywords={Text-to-Image Generation, Generative Adversarial Networks (GANs), Multimodal Models, Natural Language Processing, Computer Vision, Ethical AI, Interpretability},
        doi={10.4108/eetiot.5336}
    }
    
  • Lakshmanan Sudha
    Kari Balakrishnan Aruna
    Vijayakumar Sureka
    Mathavan Niveditha
    S Prema
    Year: 2024
    Semantic Image Synthesis from Text: Current Trends and Future Horizons in Text-to-Image Generation
    IOT
    EAI
    DOI: 10.4108/eetiot.5336
Lakshmanan Sudha1,*, Kari Balakrishnan Aruna2, Vijayakumar Sureka2, Mathavan Niveditha2, S Prema3
  • 1: S.A. Engineering College
  • 2: S.A.Engineering College
  • 3: Vel Tech Rangarajan Dr.Sagunthala R&D Institute of Science and Technology Chennai
*Contact email: sudhal@saec.ac.in

Abstract

Text-to-image generation, a captivating intersection of natural language processing and computer vision, has undergone a remarkable evolution in recent years. This research paper provides a comprehensive review of the state-of-the-art in text-to-image generation techniques, highlighting key advancements and emerging trends. We begin by surveying the foundational models, with a focus on Generative Adversarial Networks (GANs) and their pivotal role in generating realistic and diverse images from textual descriptions. We delve into the intricacies of training data, model architectures, and evaluation metrics, offering insights into the challenges and opportunities in this field. Furthermore, this paper explores the synergistic relationship between natural language processing and computer vision, showcasing multimodal models like DALL-E and CLIP. These models not only generate images from text but also understand the contextual relationships between textual descriptions and images, opening avenues for content recommendation, search engines, and visual storytelling. The paper discusses applications spanning art, design, e-commerce, healthcare, and education, where text-to-image generation has made significant inroads. We highlight the potential of this technology in automating content creation, aiding in diagnostics, and transforming the fashion and e-commerce industries. However, the journey of text-to-image generation is not without its challenges. We address ethical considerations, emphasizing responsible AI and the mitigation of biases in generated content. We also explore interpretability and model transparency, critical for ensuring trust and accountability.

Keywords
Text-to-Image Generation, Generative Adversarial Networks (GANs), Multimodal Models, Natural Language Processing, Computer Vision, Ethical AI, Interpretability
Received
2024-12-02
Accepted
2024-12-02
Published
2024-12-02
Publisher
EAI
http://dx.doi.org/10.4108/eetiot.5336

Copyright © 2024 L. Sudha et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL