
Research Article
Frame Optimization in Speech Emotion Recognition Based on Improved EMD and SVM Algorithms
@INPROCEEDINGS{10.1007/978-3-031-60347-1_11, author={Chuan-Jie Guo and Shu-Ya Jin and Yu-Zhe Zhang and Chi-Yuan Ma and Muhammad Adeel and Zhi-Yong Tao}, title={Frame Optimization in Speech Emotion Recognition Based on Improved EMD and SVM Algorithms}, proceedings={Mobile Multimedia Communications. 16th EAI International Conference, MobiMedia 2023, Guilin, China, July 22-24, 2023, Proceedings}, proceedings_a={MOBIMEDIA}, year={2024}, month={10}, keywords={Speech emotion recognition Improved EMD Signal framing SVM}, doi={10.1007/978-3-031-60347-1_11} }
- Chuan-Jie Guo
Shu-Ya Jin
Yu-Zhe Zhang
Chi-Yuan Ma
Muhammad Adeel
Zhi-Yong Tao
Year: 2024
Frame Optimization in Speech Emotion Recognition Based on Improved EMD and SVM Algorithms
MOBIMEDIA
Springer
DOI: 10.1007/978-3-031-60347-1_11
Abstract
Emotional features of speech signals are one of the keys to human-computer interaction. However, there are still great difficulties and chances to extract emotional features. There is also great controversy regarding the part of signal preprocessing. This study divides the speech signal into small frames that overlap with a portion of the previous frame and adopts an improved empirical mode decomposition (EMD) based feature extraction method. The aim is to find the most suitable framing method. Each frame signal is processed by an improved EMD to generate a set of intrinsic mode functions (IMFs). Multidimensional features are extracted by calculating the central frequency and energy intensity of each IMF, and subsequently processing the center frequency of each IMF. Specifically, we focus on the top three IMFs in terms of energy intensity. Based on the improved algorithm, we investigate the effects of different frame lengths and frame shifts on the recognition rates of three emotion classifications: happy, angry, and sad. We find that the proposed method can reach the highest recognition rate when we use a 30 ms frame length with a 25% frame shift to separate the signals.