
Research Article
FTDCN: Full Two-Dimensional Convolution Network for Speech Enhancement in Time-Frequency Domain
@INPROCEEDINGS{10.1007/978-3-031-34790-0_8, author={Maoqing Liu and Hongqing Liu and Yi Zhou and Lu Gan}, title={FTDCN: Full Two-Dimensional Convolution Network for Speech Enhancement in Time-Frequency Domain}, proceedings={Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings}, proceedings_a={CHINACOM}, year={2023}, month={6}, keywords={Speech enhancement Dual-path End-to-End}, doi={10.1007/978-3-031-34790-0_8} }
- Maoqing Liu
Hongqing Liu
Yi Zhou
Lu Gan
Year: 2023
FTDCN: Full Two-Dimensional Convolution Network for Speech Enhancement in Time-Frequency Domain
CHINACOM
Springer
DOI: 10.1007/978-3-031-34790-0_8
Abstract
The dual-path structure achieves superior performance in monaural speech enhancement (SE), demonstrating the importance of modeling the long-range spectral patterns of a single frame. In this paper, two novel causal temporal convolutional network (TCN) modules, inter-frame complex-valued two-dimensional TCN (Inter-CTTCN) and intra-frame complex-valued two-dimensional TCN (Intra-CTTCN), are proposed to capture the long-range spectral dependence within a single frame and the long-term dependence between frames, respectively. These two lightweight TCN components, which are composed entirely of two-dimensional convolutions, maintain a high dimension feature representation that facilitates the distinction between speech and noise. We join the Inter-CTTCN and Intra-CTTCN with a gated complex-valued convolutional encoder and decoder structure to design a full two-dimensional convolutional network (FTDCN) for SE in the time-frequency (T-F) domain. Using noisy speech as input, the proposed model was experimentally evaluated on the datasets of Interspeech 2020 Deep Noise Suppression Challenge (DNS Challenge 2020). The NB-PESQ of our proposed model exceeds the DNS Challenge 2020 first-placed model by 0.19 and our model requires only 0.8 M parameters.