About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I

Research Article

Automatic Speech Grading using a Multimodal Deep Learning Framework using Bert and Whisper

Download8 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.4108/eai.28-4-2025.2357788,
        author={M.  Hemantheswar Reddy and K.  Rishitha and P.  Bharath Raj and D N  Kiran Pandiri and U.  Thulasi Srinivas},
        title={Automatic Speech Grading using a Multimodal Deep Learning Framework using Bert and Whisper},
        proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part I},
        publisher={EAI},
        proceedings_a={ICITSM PART I},
        year={2025},
        month={10},
        keywords={speech grading automatic speech recognition whisper nlp pronunciation scoring fluency measurement},
        doi={10.4108/eai.28-4-2025.2357788}
    }
    
  • M. Hemantheswar Reddy
    K. Rishitha
    P. Bharath Raj
    D N Kiran Pandiri
    U. Thulasi Srinivas
    Year: 2025
    Automatic Speech Grading using a Multimodal Deep Learning Framework using Bert and Whisper
    ICITSM PART I
    EAI
    DOI: 10.4108/eai.28-4-2025.2357788
M. Hemantheswar Reddy1,*, K. Rishitha1, P. Bharath Raj1, D N Kiran Pandiri1, U. Thulasi Srinivas1
  • 1: VFSTR Deemed to be University
*Contact email: hemanth14082004@gmail.com

Abstract

This paper proposes a Natural Language Processing (NLP-based) program of speech grading for not only the audio but also the video portion that quantitatively evaluates speech in terms of grammar, vocabulary, pronunciation, fluency and accuracy. These conventional speech evaluation methods are prone to be subjective, inefficient, low feedback, and thus limit their application in overall assessment. The proposed system is a system that combines Automatic Speech Recognition (ASR) models such as Whisper that transcribe speech to text and then Natural Language Processing (NLP) technologies that analyze and score them in a standardized way. By providing plentiful and actionable feedback, the system has the potential to improve the reliability and consistency in assessment of speech. This technique has broad uses in education, recruitment, and communication training, provides a scalable and objective approach towards speech measurement.

Keywords
speech grading, automatic speech recognition, whisper, nlp, pronunciation scoring, fluency measurement
Published
2025-10-13
Publisher
EAI
http://dx.doi.org/10.4108/eai.28-4-2025.2357788
Copyright © 2025–2025 EAI
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL