About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Mobile Multimedia Communications. 15th EAI International Conference, MobiMedia 2022, Virtual Event, July 22-24, 2022, Proceedings

Research Article

PF-Net: Personalized Filter for Speaker Recognition from Raw Waveform

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-23902-1_28,
        author={Wencheng Li and Zhenhua Tan and Zhenche Xia and Danke Wu and Jingyu Ning},
        title={PF-Net: Personalized Filter for Speaker Recognition from Raw Waveform},
        proceedings={Mobile Multimedia Communications. 15th EAI International Conference, MobiMedia 2022, Virtual Event, July 22-24, 2022, Proceedings},
        proceedings_a={MOBIMEDIA},
        year={2023},
        month={2},
        keywords={Speaker recognition Raw waveform Personalized filters Deep learning},
        doi={10.1007/978-3-031-23902-1_28}
    }
    
  • Wencheng Li
    Zhenhua Tan
    Zhenche Xia
    Danke Wu
    Jingyu Ning
    Year: 2023
    PF-Net: Personalized Filter for Speaker Recognition from Raw Waveform
    MOBIMEDIA
    Springer
    DOI: 10.1007/978-3-031-23902-1_28
Wencheng Li1, Zhenhua Tan1,*, Zhenche Xia1, Danke Wu1, Jingyu Ning1
  • 1: School of Software, Northeastern University
*Contact email: tanzh@mail.neu.edu.cn

Abstract

Speaker recognition using i-vector has been replaced by speaker recognition using deep learning. Speaker recognition based on Convolutional Neural Networks (CNNs) has been widely used in recent years, which learn low-level speech representations from raw waveforms. On this basis, a CNN architecture called SincNet proposes a kind of unique convolutional layer, which has achieved band-pass filters. Compared with standard CNNs, SincNet learns the low and high cut-off frequencies of each filter. This paper proposes an improved CNNs architecture called PF-Net, which encourages the first convolutional layer to implement more personalized filters than SincNet. PF-Net parameterizes the frequency domain shape and can realize band-pass filters by learning some deformation points in frequency domain. Compared with standard CNN, PF-Net can learn the characteristics of each filter. Compared with SincNet, PF-Net can learn more characteristic parameters, instead of only low and high cut-off frequencies. This provides a personalized filter bank for different tasks. As a result, our experiments show that the PF-Net converges faster than standard CNN and performs better than SincNet. Our code is available at github.com/TAN-OpenLab/PF-NET.

Keywords
Speaker recognition Raw waveform Personalized filters Deep learning
Published
2023-02-01
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-23902-1_28
Copyright © 2022–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL