CARN-Conformer: Conformer in Attention Spectral Mapping Based Convolutional Recurrent Networks for Speech Enhancement

Bo Fang; Hongqing Liu; Yi Zhou; Yizhuo Jiang; Lu Gan

Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings

Research Article

CARN-Conformer: Conformer in Attention Spectral Mapping Based Convolutional Recurrent Networks for Speech Enhancement

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-031-34790-0_21,
    author={Bo Fang and Hongqing Liu and Yi Zhou and Yizhuo Jiang and Lu Gan},
    title={CARN-Conformer: Conformer in Attention Spectral Mapping Based Convolutional Recurrent Networks for Speech Enhancement},
    proceedings={Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings},
    proceedings_a={CHINACOM},
    year={2023},
    month={6},
    keywords={Speech enhancement Attention Time-frequency domain},
    doi={10.1007/978-3-031-34790-0_21}
}

Bo Fang
Hongqing Liu
Yi Zhou
Yizhuo Jiang
Lu Gan
Year: 2023
CARN-Conformer: Conformer in Attention Spectral Mapping Based Convolutional Recurrent Networks for Speech Enhancement
CHINACOM
Springer
DOI: 10.1007/978-3-031-34790-0_21

Bo Fang¹^,*, Hongqing Liu¹, Yi Zhou¹, Yizhuo Jiang², Lu Gan²

1: School of Communication and Information Engineering
2: College of Engineering, Design and Physical Science, Brunel University

*Contact email: s200131155@stu.cqupt.edu.cn

Abstract

In recent years, the attention transformer model has been widely used in the field of speech enhancement. With the introduction of a convolutionally enhanced transformer (Conformer), it models both the local and the global information of the speech sequence to achieve a better performance. In this paper, we propose a speech enhancement structure using conformer with time-frequency (TF) domain in DCCRN. To that aim, the second layer LSTM in DCCRN is replaced with TF-Conformer. By doing this, information between and within frames can be better utilized. An attention convolution path between the convolutional encoder and decoder is also developed to better convey nonlinear information. The results show that the model’s PESQ surpasses DCCRN and DCCRN+ on the testset of Interspeech 2020 Deep Noise Suppression (DNS) Challenge, with the best model size of 2.3 M. At the same time, the excellent results have been obtained on the blind test set of ICASSP 2021 DNS Challenge, and the overall MOS score exceeds the winner team by 0.06.

Keywords: Speech enhancement, Attention, Time-frequency domain

Published: 2023-06-10
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-031-34790-0_21

CARN-Conformer: Conformer in Attention Spectral Mapping Based Convolutional Recurrent Networks for Speech Enhancement

Abstract

About EAI

Community

Publish with EAI