Research Article
Real Time Distant Speech Emotion Recognition in Indoor Environments
@INPROCEEDINGS{10.4108/eai.7-11-2017.2273791, author={Mohsin Ahmed and Zeya Chen and Emma Fass and John Stankovic}, title={Real Time Distant Speech Emotion Recognition in Indoor Environments}, proceedings={14th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services}, publisher={ACM}, proceedings_a={MOBIQUITOUS}, year={2018}, month={4}, keywords={emotion speech noise and reverberation}, doi={10.4108/eai.7-11-2017.2273791} }
- Mohsin Ahmed
Zeya Chen
Emma Fass
John Stankovic
Year: 2018
Real Time Distant Speech Emotion Recognition in Indoor Environments
MOBIQUITOUS
ACM
DOI: 10.4108/eai.7-11-2017.2273791
Abstract
We develop solutions to various challenges in different stages of the processing pipeline of a real time indoor distant speech emotion recognition system to reduce the discrepancy between training and test conditions for distant emotion recognition. We use a novel combination of distorted feature elimination, classifier optimization, several signal cleaning techniques and train classifiers with synthetic reverberation obtained from a room impulse response generator to improve performance in a variety of rooms with various source-to-microphone distances. Our comprehensive evaluation is based on a popular emotional corpus from the literature, two new customized datasets and a dataset made of YouTube videos. The two new datasets are the first ever distance aware emotional corpuses and we created them by 1) injecting room impulse responses collected in a variety of rooms with various source-to-microphone distances into a public emotional corpus; and by 2) re-recording the emotional corpus with microphones placed at different distances. The overall performance results show as much as 15.51% improvement in distant emotion detection over baselines, with a final emotion recognition accuracy ranging between 79.44%-95.89% for different rooms, acoustic configurations and source-to-microphone distances. We experimentally evaluate the CPU time of various system components and demonstrate the real time capability of our system.