Research Article
Learning Deep Representation of The Emotion Speech Signal
@INPROCEEDINGS{10.4108/eai.6-6-2021.2307539, author={Junyi Duan and Zheng Song and Jianfeng Zhao}, title={Learning Deep Representation of The Emotion Speech Signal}, proceedings={Proceedings of the 8th EAI International Conference on Green Energy and Networking, GreeNets 2021, June 6-7, 2021, Dalian, People’s Republic of China}, publisher={EAI}, proceedings_a={GREENETS}, year={2021}, month={8}, keywords={deep representation deep learning speech signal}, doi={10.4108/eai.6-6-2021.2307539} }
- Junyi Duan
Zheng Song
Jianfeng Zhao
Year: 2021
Learning Deep Representation of The Emotion Speech Signal
GREENETS
EAI
DOI: 10.4108/eai.6-6-2021.2307539
Abstract
This paper aims at learning deep representation of emotion speech signal directly from raw audio clip using a 1D convolutional encoder, and reconstructing the audio signal using a 1D deconvolutional decoder. The learned deep features which contain the essential information of the signal, should be robust enough to reconstruct the speech signal. The location of the maximal value in the pooled receptive field of the max pooling layer is passed to the corresponding unpooling layer for reconstructing the audio clip. Residual learning is adopted to ease the training process. A dual training mechanism was developed to enable the decoder to reconstruct the speech signal from the deep representation more accurate. After completing the training of the convolutional-deconvolutional encoder-decoder as a whole, the decoder with transferred features was trained again. Experiments conducted on Berlin EmoDB and SAVEE database achieve excellent performances.