About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings

Research Article

Convolutional Recurrent Neural Network Based on Short-Time Discrete Cosine Transform for Monaural Speech Enhancement

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-34790-0_13,
        author={Jinzuo Guo and Yi Zhou and Hongqing Liu and Yongbao Ma},
        title={Convolutional Recurrent Neural Network Based on Short-Time Discrete Cosine Transform for Monaural Speech Enhancement},
        proceedings={Communications and Networking. 17th EAI International Conference, Chinacom 2022, Virtual Event, November 19-20, 2022, Proceedings},
        proceedings_a={CHINACOM},
        year={2023},
        month={6},
        keywords={Speech enhancement Deep learning Convolutional recurrent neural network Discrete cosine transform},
        doi={10.1007/978-3-031-34790-0_13}
    }
    
  • Jinzuo Guo
    Yi Zhou
    Hongqing Liu
    Yongbao Ma
    Year: 2023
    Convolutional Recurrent Neural Network Based on Short-Time Discrete Cosine Transform for Monaural Speech Enhancement
    CHINACOM
    Springer
    DOI: 10.1007/978-3-031-34790-0_13
Jinzuo Guo1,*, Yi Zhou1, Hongqing Liu1, Yongbao Ma2
  • 1: School of Communication and Information Engineering
  • 2: Suresense Technology
*Contact email: s200131221@stu.cqupt.edu.cn

Abstract

Speech enhancement algorithms based on deep learning have greatly improved speech’s perceptual quality and intelligibility. Complex-valued neural networks, such as deep complex convolution recurrent network (DCCRN), make full use of audio signal phase information and achieve superior performance, but complex-valued operations increase the computational complexity. Inspired by the deep cosine transform convolutional recurrent network (DCTCRN) model, in this paper real-valued discrete cosine transform is used instead of complex-valued Fourier transform. Besides, the ideal cosine mask is employed as the training target, and the real-valued convolutional recurrent network (CRNN) is used to enhance the speech while reducing algorithm complexity. Meanwhile, the frequency-time-LSTM (F-T-LSTM) module is used for better temporal modeling and the convolutional skip connections module is introduced between the encoders and the decoders to integrate the information between features. Moreover, the improved scale-invariant source-to-noise ratio (SI-SNR) is taken as the loss function which enables the model to focus more on the part of signal variation and thus obtain better noise suppression performance. With only 1.31M parameters, the proposed method can achieve noise suppression performance that exceeds DCCRN and DCTCRN.

Keywords
Speech enhancement Deep learning Convolutional recurrent neural network Discrete cosine transform
Published
2023-06-10
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-34790-0_13
Copyright © 2022–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL