About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Cognitive Computing and Cyber Physical Systems. 4th EAI International Conference, IC4S 2023, Bhimavaram, Andhra Pradesh, India, August 4-6, 2023, Proceedings, Part I

Research Article

Speaker Recognition Using Convolutional Autoencoder in Mismatch Condition with Small Dataset in Noisy Background

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-48888-7_27,
        author={Arundhati Niwatkar and Yuvraj Kanse and Ajay Kumar Kushwaha},
        title={Speaker Recognition Using Convolutional Autoencoder in Mismatch Condition with Small Dataset in Noisy Background},
        proceedings={Cognitive Computing and Cyber Physical Systems. 4th EAI International Conference, IC4S 2023, Bhimavaram, Andhra Pradesh, India, August 4-6, 2023, Proceedings, Part I},
        proceedings_a={IC4S},
        year={2024},
        month={1},
        keywords={convolutional autoencoder deep-learning speaker recognition MFCC mismatch condition},
        doi={10.1007/978-3-031-48888-7_27}
    }
    
  • Arundhati Niwatkar
    Yuvraj Kanse
    Ajay Kumar Kushwaha
    Year: 2024
    Speaker Recognition Using Convolutional Autoencoder in Mismatch Condition with Small Dataset in Noisy Background
    IC4S
    Springer
    DOI: 10.1007/978-3-031-48888-7_27
Arundhati Niwatkar1, Yuvraj Kanse2, Ajay Kumar Kushwaha,*
  • 1: Sivaji University
  • 2: Karmaveer Bhaurao Patil College of Engineering
*Contact email: akkushwaha@bvucoep.edu.in

Abstract

The objective of this paper is to increase the success rate and accuracy of speaker recognition and identification systems through the proposal of a novel approach. Data augmentation techniques have been employed to enhance a small dataset comprising audio recordings from five speakers, encompassing both male and female voices. The Python programming language is used for data processing. The chosen model is a convolutional autoencoder. In order to convert the speech signal into an image, their respective spectrograms have been used. Consequently, a set of images serves as the input for training the autoencoder. A speaker recognition and identification system are developed using the convolutional autoencoder, a deep learning technique. A comparative analysis is conducted of the results against traditional systems reliant on the MFCC feature extraction technique. The proposed system exhibits a high success rate, indicating its efficacy in accurately recognising and identifying speakers. To account for a “mismatch condition,” different time durations of the audio signal are utilised during both the training and testing phases. Through a series of experiments involving various activation and loss functions in permutation and combination, the optimal pair for the small dataset is successfully identified, yielding favorable outcomes. In matched conditions, this system has achieved 92.4% accuracy rate.

Keywords
convolutional autoencoder deep-learning speaker recognition MFCC mismatch condition
Published
2024-01-05
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-48888-7_27
Copyright © 2023–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL