Speaker Recognition Using Convolutional Autoencoder in Mismatch Condition with Small Dataset in Noisy Background

Arundhati Niwatkar; Yuvraj Kanse; Ajay Kumar Kushwaha

Cognitive Computing and Cyber Physical Systems. 4th EAI International Conference, IC4S 2023, Bhimavaram, Andhra Pradesh, India, August 4-6, 2023, Proceedings, Part I

Research Article

Speaker Recognition Using Convolutional Autoencoder in Mismatch Condition with Small Dataset in Noisy Background

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-031-48888-7_27,
    author={Arundhati Niwatkar and Yuvraj Kanse and Ajay Kumar Kushwaha},
    title={Speaker Recognition Using Convolutional Autoencoder in Mismatch Condition with Small Dataset in Noisy Background},
    proceedings={Cognitive Computing and Cyber Physical Systems. 4th EAI International Conference, IC4S 2023, Bhimavaram, Andhra Pradesh, India, August 4-6, 2023, Proceedings, Part I},
    proceedings_a={IC4S},
    year={2024},
    month={1},
    keywords={convolutional autoencoder deep-learning speaker recognition MFCC mismatch condition},
    doi={10.1007/978-3-031-48888-7_27}
}

Arundhati Niwatkar
Yuvraj Kanse
Ajay Kumar Kushwaha
Year: 2024
Speaker Recognition Using Convolutional Autoencoder in Mismatch Condition with Small Dataset in Noisy Background
IC4S
Springer
DOI: 10.1007/978-3-031-48888-7_27

Arundhati Niwatkar¹, Yuvraj Kanse², Ajay Kumar Kushwaha^,*

1: Sivaji University
2: Karmaveer Bhaurao Patil College of Engineering

*Contact email: akkushwaha@bvucoep.edu.in

Abstract

The objective of this paper is to increase the success rate and accuracy of speaker recognition and identification systems through the proposal of a novel approach. Data augmentation techniques have been employed to enhance a small dataset comprising audio recordings from five speakers, encompassing both male and female voices. The Python programming language is used for data processing. The chosen model is a convolutional autoencoder. In order to convert the speech signal into an image, their respective spectrograms have been used. Consequently, a set of images serves as the input for training the autoencoder. A speaker recognition and identification system are developed using the convolutional autoencoder, a deep learning technique. A comparative analysis is conducted of the results against traditional systems reliant on the MFCC feature extraction technique. The proposed system exhibits a high success rate, indicating its efficacy in accurately recognising and identifying speakers. To account for a “mismatch condition,” different time durations of the audio signal are utilised during both the training and testing phases. Through a series of experiments involving various activation and loss functions in permutation and combination, the optimal pair for the small dataset is successfully identified, yielding favorable outcomes. In matched conditions, this system has achieved 92.4% accuracy rate.

Keywords: convolutional autoencoder, deep-learning, speaker recognition, MFCC, mismatch condition

Published: 2024-01-05
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-031-48888-7_27

Speaker Recognition Using Convolutional Autoencoder in Mismatch Condition with Small Dataset in Noisy Background

Abstract

About EAI

Community

Publish with EAI