About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings

Research Article

FTDCN: Full Two-Dimensional Convolution Network for Speech Enhancement in Time-Frequency Domain

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-34790-0_8,
        author={Maoqing Liu and Hongqing Liu and Yi Zhou and Lu Gan},
        title={FTDCN: Full Two-Dimensional Convolution Network for Speech Enhancement in Time-Frequency Domain},
        proceedings={Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings},
        proceedings_a={CHINACOM},
        year={2023},
        month={6},
        keywords={Speech enhancement Dual-path End-to-End},
        doi={10.1007/978-3-031-34790-0_8}
    }
    
  • Maoqing Liu
    Hongqing Liu
    Yi Zhou
    Lu Gan
    Year: 2023
    FTDCN: Full Two-Dimensional Convolution Network for Speech Enhancement in Time-Frequency Domain
    CHINACOM
    Springer
    DOI: 10.1007/978-3-031-34790-0_8
Maoqing Liu1,*, Hongqing Liu1, Yi Zhou1, Lu Gan2
  • 1: School of Communication and Information Engineering
  • 2: College of Engineering, Design and Physical Science, Brunel University
*Contact email: s200131275@stu.cqupt.edu.cn

Abstract

The dual-path structure achieves superior performance in monaural speech enhancement (SE), demonstrating the importance of modeling the long-range spectral patterns of a single frame. In this paper, two novel causal temporal convolutional network (TCN) modules, inter-frame complex-valued two-dimensional TCN (Inter-CTTCN) and intra-frame complex-valued two-dimensional TCN (Intra-CTTCN), are proposed to capture the long-range spectral dependence within a single frame and the long-term dependence between frames, respectively. These two lightweight TCN components, which are composed entirely of two-dimensional convolutions, maintain a high dimension feature representation that facilitates the distinction between speech and noise. We join the Inter-CTTCN and Intra-CTTCN with a gated complex-valued convolutional encoder and decoder structure to design a full two-dimensional convolutional network (FTDCN) for SE in the time-frequency (T-F) domain. Using noisy speech as input, the proposed model was experimentally evaluated on the datasets of Interspeech 2020 Deep Noise Suppression Challenge (DNS Challenge 2020). The NB-PESQ of our proposed model exceeds the DNS Challenge 2020 first-placed model by 0.19 and our model requires only 0.8 M parameters.

Keywords
Speech enhancement Dual-path End-to-End
Published
2023-06-10
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-34790-0_8
Copyright © 2022–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL