About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Security and Privacy in New Computing Environments. 6th International Conference, SPNCE 2023, Guangzhou, China, November 25–26, 2023, Proceedings

Research Article

Two-Stage Multi-lingual Speech Emotion Recognition for Multi-lingual Emotional Speech Synthesis

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-73699-5_14,
        author={Xin Huang and Zuqiang Zeng and Chenjing Sun and Jichen Yang},
        title={Two-Stage Multi-lingual Speech Emotion Recognition for Multi-lingual Emotional Speech Synthesis},
        proceedings={Security and Privacy in New Computing Environments. 6th International Conference, SPNCE 2023, Guangzhou, China, November 25--26, 2023, Proceedings},
        proceedings_a={SPNCE},
        year={2025},
        month={1},
        keywords={Speech emotion recognition Multi-lingual Emotional speech synthesis},
        doi={10.1007/978-3-031-73699-5_14}
    }
    
  • Xin Huang
    Zuqiang Zeng
    Chenjing Sun
    Jichen Yang
    Year: 2025
    Two-Stage Multi-lingual Speech Emotion Recognition for Multi-lingual Emotional Speech Synthesis
    SPNCE
    Springer
    DOI: 10.1007/978-3-031-73699-5_14
Xin Huang1, Zuqiang Zeng1, Chenjing Sun1, Jichen Yang2,*
  • 1: School of Electronics and Information Engineering, South China Normal University
  • 2: School of Cyber Security, Guangdong Polytechnic Normal University
*Contact email: nisonyoung@163.com

Abstract

In multi-lingual emotional speech synthesis, it is difficult to incorporate suitable emotional expressions in the synthesis process due to the differences between the emotional expressions of different linguals. In order to extract better emotional expressions of different linguals to assist the multi-lingual emotional speech synthesis, this paper conducts research on multi-lingual speech emotion recognition. In the current study of multi-lingual speech emotion recognition (SER), the combining method (TCM) and multi-task method (TMM) are the popular methods. However, good performance can’t be obtained, the reason is that TCM doesn’t consider the emotional difference of different linguals and it is not easy to train the good emotion recognition model and good language recognition model at the same time for TMM. In order to settle the issue, a two-stage multi-lingual SER method is proposed in this paper, wherein language recognition is to recognize the language type at the first stage, and then emotion recognition is applied at the second stage. In addition, wav2vec 2.0 is used as the input while ResNet18 is selected as the model for language recognition and emotion recognition respectively. The experimental results show that the proposed method can work on multi-lingual SER, meanwhile, the proposed method performs better than TCM and TMM.

Keywords
Speech emotion recognition Multi-lingual Emotional speech synthesis
Published
2025-01-01
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-73699-5_14
Copyright © 2023–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL