Automated Image Caption Generation using CNN and LSTM

M Vinodh Kumar; P Lakshmi Karthikeya; G Sai Chand; Sivadi Balakrishna

Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part II

Research Article

Automated Image Caption Generation using CNN and LSTM

Download10 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/eai.28-4-2025.2358102,
    author={M  Vinodh Kumar and P  Lakshmi Karthikeya and G  Sai Chand and Sivadi  Balakrishna},
    title={Automated Image Caption Generation using CNN and LSTM},
    proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part II},
    publisher={EAI},
    proceedings_a={ICITSM PART II},
    year={2025},
    month={10},
    keywords={helmet infractions number plate recognition optical character recognition (ocr) traffic violations and you only look once (yolov11)},
    doi={10.4108/eai.28-4-2025.2358102}
}

M Vinodh Kumar
P Lakshmi Karthikeya
G Sai Chand
Sivadi Balakrishna
Year: 2025
Automated Image Caption Generation using CNN and LSTM
ICITSM PART II
EAI
DOI: 10.4108/eai.28-4-2025.2358102

M Vinodh Kumar¹^,*, P Lakshmi Karthikeya¹, G Sai Chand¹, Sivadi Balakrishna¹

1: Vignan’s Foundation for Science Technology and Research

*Contact email: vinodhkumarmainam@gmail.com

Abstract

Image captioning is the challenging task of automatically generating a description for an image using computer vision and natural language processing. In this work, CNN and LSTM are integrated here to give a deep learning-based automatic captioning model. CNN serves as a visual feature extractor by capturing important patterns from the input images. These features are then passed to an LSTM which generates the grammatically and semantically meaningful captions. The model is trained on Flickr8k dataset which contains images with several human-generated captions overlaid on them. Text Embedding representation Several preprocessing techniques have been used to enhance linguistic representation text embedding 5, Sequence padding, and Tokenization. The model is evaluated by genera- ted captions and reference descriptions with the BLEU (Bilingual Evaluation Understudy) since naturally it is available once generated. Experimental results demonstrate that the proposed model is able to successfully capture visual semantics of the images and generate reasonable descriptions, thus showing the power of deep learning for automatic image understanding.

Keywords: helmet infractions, number plate recognition, optical character recognition (ocr), traffic violations and you only look once (yolov11)

Published: 2025-10-14
Publisher: EAI

: http://dx.doi.org/10.4108/eai.28-4-2025.2358102

Automated Image Caption Generation using CNN and LSTM

Abstract

About EAI

Community

Publish with EAI