
Research Article
CARN-Conformer: Conformer in Attention Spectral Mapping Based Convolutional Recurrent Networks for Speech Enhancement
@INPROCEEDINGS{10.1007/978-3-031-34790-0_21, author={Bo Fang and Hongqing Liu and Yi Zhou and Yizhuo Jiang and Lu Gan}, title={CARN-Conformer: Conformer in Attention Spectral Mapping Based Convolutional Recurrent Networks for Speech Enhancement}, proceedings={Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings}, proceedings_a={CHINACOM}, year={2023}, month={6}, keywords={Speech enhancement Attention Time-frequency domain}, doi={10.1007/978-3-031-34790-0_21} }
- Bo Fang
Hongqing Liu
Yi Zhou
Yizhuo Jiang
Lu Gan
Year: 2023
CARN-Conformer: Conformer in Attention Spectral Mapping Based Convolutional Recurrent Networks for Speech Enhancement
CHINACOM
Springer
DOI: 10.1007/978-3-031-34790-0_21
Abstract
In recent years, the attention transformer model has been widely used in the field of speech enhancement. With the introduction of a convolutionally enhanced transformer (Conformer), it models both the local and the global information of the speech sequence to achieve a better performance. In this paper, we propose a speech enhancement structure using conformer with time-frequency (TF) domain in DCCRN. To that aim, the second layer LSTM in DCCRN is replaced with TF-Conformer. By doing this, information between and within frames can be better utilized. An attention convolution path between the convolutional encoder and decoder is also developed to better convey nonlinear information. The results show that the model’s PESQ surpasses DCCRN and DCCRN+ on the testset of Interspeech 2020 Deep Noise Suppression (DNS) Challenge, with the best model size of 2.3 M. At the same time, the excellent results have been obtained on the blind test set of ICASSP 2021 DNS Challenge, and the overall MOS score exceeds the winner team by 0.06.