Unraveling the Techniques for Speaker Diarization

Ganesh Pechetti; Anakapalli Rohini Durga Bhavani; Abhinav Dayal; Sreenu Ponnada

Cognitive Computing and Cyber Physical Systems. 4th EAI International Conference, IC4S 2023, Bhimavaram, Andhra Pradesh, India, August 4-6, 2023, Proceedings, Part I

Research Article

Unraveling the Techniques for Speaker Diarization

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-031-48888-7_25,
    author={Ganesh Pechetti and Anakapalli Rohini Durga Bhavani and Abhinav Dayal and Sreenu Ponnada},
    title={Unraveling the Techniques for Speaker Diarization},
    proceedings={Cognitive Computing and Cyber Physical Systems. 4th EAI International Conference, IC4S 2023, Bhimavaram, Andhra Pradesh, India, August 4-6, 2023, Proceedings, Part I},
    proceedings_a={IC4S},
    year={2024},
    month={1},
    keywords={Speaker Diarization Segmentation Voice Activity Detection Pyannote Kaldi NeMo},
    doi={10.1007/978-3-031-48888-7_25}
}

Ganesh Pechetti
Anakapalli Rohini Durga Bhavani
Abhinav Dayal
Sreenu Ponnada
Year: 2024
Unraveling the Techniques for Speaker Diarization
IC4S
Springer
DOI: 10.1007/978-3-031-48888-7_25

Ganesh Pechetti¹, Anakapalli Rohini Durga Bhavani¹, Abhinav Dayal¹^,*, Sreenu Ponnada¹

1: Computer Science and Engineeering Department, Vishnu Institute of Technology

*Contact email: abhinav.dayal@vishnu.edu.in

Abstract

This research paper aims to contribute to the field of speaker diarization by providing an in-depth analysis of existing audio datasets and evaluating prominent models. The study focuses on the suitability of these datasets for studying speaker diarization tasks and examines the performance of models such as pyannote-speaker diarization and NVIDIA NeMo speaker diarization. For aspiring researchers in the field, this paper serves as a solid foundation, offering valuable guidance and resources for experimentation in speaker diarization. The evaluation of the models reveals important insights. While each model has its advantages, their limitations must be considered. Overall, this research paper provides valuable insights into audio dataset analysis, model evaluation, and selection considerations for speaker diarization tasks. It equips researchers with essential knowledge to make informed decisions and lays the groundwork for further advancements in the field.

Keywords: Speaker Diarization, Segmentation, Voice Activity Detection, Pyannote, Kaldi, NeMo

Published: 2024-01-05
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-031-48888-7_25

Unraveling the Techniques for Speaker Diarization

Abstract

About EAI

Community

Publish with EAI