About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings

Research Article

CARN-Conformer: Conformer in Attention Spectral Mapping Based Convolutional Recurrent Networks for Speech Enhancement

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-34790-0_21,
        author={Bo Fang and Hongqing Liu and Yi Zhou and Yizhuo Jiang and Lu Gan},
        title={CARN-Conformer: Conformer in Attention Spectral Mapping Based Convolutional Recurrent Networks for Speech Enhancement},
        proceedings={Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings},
        proceedings_a={CHINACOM},
        year={2023},
        month={6},
        keywords={Speech enhancement Attention Time-frequency domain},
        doi={10.1007/978-3-031-34790-0_21}
    }
    
  • Bo Fang
    Hongqing Liu
    Yi Zhou
    Yizhuo Jiang
    Lu Gan
    Year: 2023
    CARN-Conformer: Conformer in Attention Spectral Mapping Based Convolutional Recurrent Networks for Speech Enhancement
    CHINACOM
    Springer
    DOI: 10.1007/978-3-031-34790-0_21
Bo Fang1,*, Hongqing Liu1, Yi Zhou1, Yizhuo Jiang2, Lu Gan2
  • 1: School of Communication and Information Engineering
  • 2: College of Engineering, Design and Physical Science, Brunel University
*Contact email: s200131155@stu.cqupt.edu.cn

Abstract

In recent years, the attention transformer model has been widely used in the field of speech enhancement. With the introduction of a convolutionally enhanced transformer (Conformer), it models both the local and the global information of the speech sequence to achieve a better performance. In this paper, we propose a speech enhancement structure using conformer with time-frequency (TF) domain in DCCRN. To that aim, the second layer LSTM in DCCRN is replaced with TF-Conformer. By doing this, information between and within frames can be better utilized. An attention convolution path between the convolutional encoder and decoder is also developed to better convey nonlinear information. The results show that the model’s PESQ surpasses DCCRN and DCCRN+ on the testset of Interspeech 2020 Deep Noise Suppression (DNS) Challenge, with the best model size of 2.3 M. At the same time, the excellent results have been obtained on the blind test set of ICASSP 2021 DNS Challenge, and the overall MOS score exceeds the winner team by 0.06.

Keywords
Speech enhancement Attention Time-frequency domain
Published
2023-06-10
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-34790-0_21
Copyright © 2022–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL