About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Security and Privacy in New Computing Environments. 6th International Conference, SPNCE 2023, Guangzhou, China, November 25–26, 2023, Proceedings

Research Article

Speech Emotion Recognition Based on Recurrent Neural Networks with Conformer for Emotional Speech Synthesis

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-73699-5_19,
        author={Xin Huang and Chenjing Sun and Jichen Yang and Xianhua Hou},
        title={Speech Emotion Recognition Based on Recurrent Neural Networks with Conformer for Emotional Speech Synthesis},
        proceedings={Security and Privacy in New Computing Environments. 6th International Conference, SPNCE 2023, Guangzhou, China, November 25--26, 2023, Proceedings},
        proceedings_a={SPNCE},
        year={2025},
        month={1},
        keywords={Speech emotion recognition Emotional speech synthesis Conformer},
        doi={10.1007/978-3-031-73699-5_19}
    }
    
  • Xin Huang
    Chenjing Sun
    Jichen Yang
    Xianhua Hou
    Year: 2025
    Speech Emotion Recognition Based on Recurrent Neural Networks with Conformer for Emotional Speech Synthesis
    SPNCE
    Springer
    DOI: 10.1007/978-3-031-73699-5_19
Xin Huang1, Chenjing Sun1, Jichen Yang2,*, Xianhua Hou1
  • 1: School of Electronics and Information Engineering, South China Normal University
  • 2: School of Cyber Security, Guangdong Polytechnic Normal University
*Contact email: nisonyoung@163.com

Abstract

Speech emotion recognition is the basis of emotional speech synthesis, a good speech emotion recognition system can learn more emotional expressions in speech and help in the synthesis of emotional speech. However, there are a number of issues that make the speech emotion recognition task difficult, including background noise and the distinct speech features of each speaker. The widely recognized speech emotion recognition system ACRNN extracts local features from speech signals using CNN, and its attention mechanism concentrates on the emotional content of the speech data. However, because only a single attention module is used, it is unable to simultaneously attend to the information from distinct representation subspaces at different locations, nor is it able to acquire long-term global information. The paper proposes CoRNN, which applies Conformer to replace CNN and attention module, with the purpose of overcoming the shortcomings of ACRNN. The experimental results on IEMOCAP dataset demonstrate that the unweighted average recall of the proposed CoRNN can achieve 65.53%, which improves 0.79% comparing with ACRNN.

Keywords
Speech emotion recognition Emotional speech synthesis Conformer
Published
2025-01-01
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-73699-5_19
Copyright © 2023–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL