About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Cognitive Computing and Cyber Physical Systems. 4th EAI International Conference, IC4S 2023, Bhimavaram, Andhra Pradesh, India, August 4-6, 2023, Proceedings, Part I

Research Article

Unraveling the Techniques for Speaker Diarization

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-48888-7_25,
        author={Ganesh Pechetti and Anakapalli Rohini Durga Bhavani and Abhinav Dayal and Sreenu Ponnada},
        title={Unraveling the Techniques for Speaker Diarization},
        proceedings={Cognitive Computing and Cyber Physical Systems. 4th EAI International Conference, IC4S 2023, Bhimavaram, Andhra Pradesh, India, August 4-6, 2023, Proceedings, Part I},
        proceedings_a={IC4S},
        year={2024},
        month={1},
        keywords={Speaker Diarization Segmentation Voice Activity Detection Pyannote Kaldi NeMo},
        doi={10.1007/978-3-031-48888-7_25}
    }
    
  • Ganesh Pechetti
    Anakapalli Rohini Durga Bhavani
    Abhinav Dayal
    Sreenu Ponnada
    Year: 2024
    Unraveling the Techniques for Speaker Diarization
    IC4S
    Springer
    DOI: 10.1007/978-3-031-48888-7_25
Ganesh Pechetti1, Anakapalli Rohini Durga Bhavani1, Abhinav Dayal1,*, Sreenu Ponnada1
  • 1: Computer Science and Engineeering Department, Vishnu Institute of Technology
*Contact email: abhinav.dayal@vishnu.edu.in

Abstract

This research paper aims to contribute to the field of speaker diarization by providing an in-depth analysis of existing audio datasets and evaluating prominent models. The study focuses on the suitability of these datasets for studying speaker diarization tasks and examines the performance of models such as pyannote-speaker diarization and NVIDIA NeMo speaker diarization. For aspiring researchers in the field, this paper serves as a solid foundation, offering valuable guidance and resources for experimentation in speaker diarization. The evaluation of the models reveals important insights. While each model has its advantages, their limitations must be considered. Overall, this research paper provides valuable insights into audio dataset analysis, model evaluation, and selection considerations for speaker diarization tasks. It equips researchers with essential knowledge to make informed decisions and lays the groundwork for further advancements in the field.

Keywords
Speaker Diarization Segmentation Voice Activity Detection Pyannote Kaldi NeMo
Published
2024-01-05
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-48888-7_25
Copyright © 2023–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL