About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part II

Research Article

Automated Image Caption Generation using CNN and LSTM

Download10 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.4108/eai.28-4-2025.2358102,
        author={M  Vinodh Kumar and P  Lakshmi Karthikeya and G  Sai Chand and Sivadi  Balakrishna},
        title={Automated Image Caption Generation using CNN and LSTM},
        proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part II},
        publisher={EAI},
        proceedings_a={ICITSM PART II},
        year={2025},
        month={10},
        keywords={helmet infractions number plate recognition optical character recognition (ocr) traffic violations and you only look once (yolov11)},
        doi={10.4108/eai.28-4-2025.2358102}
    }
    
  • M Vinodh Kumar
    P Lakshmi Karthikeya
    G Sai Chand
    Sivadi Balakrishna
    Year: 2025
    Automated Image Caption Generation using CNN and LSTM
    ICITSM PART II
    EAI
    DOI: 10.4108/eai.28-4-2025.2358102
M Vinodh Kumar1,*, P Lakshmi Karthikeya1, G Sai Chand1, Sivadi Balakrishna1
  • 1: Vignan’s Foundation for Science Technology and Research
*Contact email: vinodhkumarmainam@gmail.com

Abstract

Image captioning is the challenging task of automatically generating a description for an image using computer vision and natural language processing. In this work, CNN and LSTM are integrated here to give a deep learning-based automatic captioning model. CNN serves as a visual feature extractor by capturing important patterns from the input images. These features are then passed to an LSTM which generates the grammatically and semantically meaningful captions. The model is trained on Flickr8k dataset which contains images with several human-generated captions overlaid on them. Text Embedding representation Several preprocessing techniques have been used to enhance linguistic representation text embedding 5, Sequence padding, and Tokenization. The model is evaluated by genera- ted captions and reference descriptions with the BLEU (Bilingual Evaluation Understudy) since naturally it is available once generated. Experimental results demonstrate that the proposed model is able to successfully capture visual semantics of the images and generate reasonable descriptions, thus showing the power of deep learning for automatic image understanding.

Keywords
helmet infractions, number plate recognition, optical character recognition (ocr), traffic violations and you only look once (yolov11)
Published
2025-10-14
Publisher
EAI
http://dx.doi.org/10.4108/eai.28-4-2025.2358102
Copyright © 2025–2025 EAI
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL