FTDCN: Full Two-Dimensional Convolution Network for Speech Enhancement in Time-Frequency Domain

Maoqing Liu; Hongqing Liu; Yi Zhou; Lu Gan

Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings

Research Article

FTDCN: Full Two-Dimensional Convolution Network for Speech Enhancement in Time-Frequency Domain

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-031-34790-0_8,
    author={Maoqing Liu and Hongqing Liu and Yi Zhou and Lu Gan},
    title={FTDCN: Full Two-Dimensional Convolution Network for Speech Enhancement in Time-Frequency Domain},
    proceedings={Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings},
    proceedings_a={CHINACOM},
    year={2023},
    month={6},
    keywords={Speech enhancement Dual-path End-to-End},
    doi={10.1007/978-3-031-34790-0_8}
}

Maoqing Liu
Hongqing Liu
Yi Zhou
Lu Gan
Year: 2023
FTDCN: Full Two-Dimensional Convolution Network for Speech Enhancement in Time-Frequency Domain
CHINACOM
Springer
DOI: 10.1007/978-3-031-34790-0_8

Maoqing Liu¹^,*, Hongqing Liu¹, Yi Zhou¹, Lu Gan²

1: School of Communication and Information Engineering
2: College of Engineering, Design and Physical Science, Brunel University

*Contact email: s200131275@stu.cqupt.edu.cn

Abstract

The dual-path structure achieves superior performance in monaural speech enhancement (SE), demonstrating the importance of modeling the long-range spectral patterns of a single frame. In this paper, two novel causal temporal convolutional network (TCN) modules, inter-frame complex-valued two-dimensional TCN (Inter-CTTCN) and intra-frame complex-valued two-dimensional TCN (Intra-CTTCN), are proposed to capture the long-range spectral dependence within a single frame and the long-term dependence between frames, respectively. These two lightweight TCN components, which are composed entirely of two-dimensional convolutions, maintain a high dimension feature representation that facilitates the distinction between speech and noise. We join the Inter-CTTCN and Intra-CTTCN with a gated complex-valued convolutional encoder and decoder structure to design a full two-dimensional convolutional network (FTDCN) for SE in the time-frequency (T-F) domain. Using noisy speech as input, the proposed model was experimentally evaluated on the datasets of Interspeech 2020 Deep Noise Suppression Challenge (DNS Challenge 2020). The NB-PESQ of our proposed model exceeds the DNS Challenge 2020 first-placed model by 0.19 and our model requires only 0.8 M parameters.

Keywords: Speech enhancement, Dual-path, End-to-End

Published: 2023-06-10
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-031-34790-0_8

FTDCN: Full Two-Dimensional Convolution Network for Speech Enhancement in Time-Frequency Domain

Abstract

About EAI

Community

Publish with EAI