14th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services

Research Article

Real Time Distant Speech Emotion Recognition in Indoor Environments

  • @INPROCEEDINGS{10.4108/eai.7-11-2017.2273791,
        author={Mohsin Ahmed and Zeya Chen and Emma Fass and John Stankovic},
        title={Real Time Distant Speech Emotion Recognition in Indoor Environments},
        proceedings={14th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services},
        publisher={ACM},
        proceedings_a={MOBIQUITOUS},
        year={2018},
        month={4},
        keywords={emotion speech noise and reverberation},
        doi={10.4108/eai.7-11-2017.2273791}
    }
    
  • Mohsin Ahmed
    Zeya Chen
    Emma Fass
    John Stankovic
    Year: 2018
    Real Time Distant Speech Emotion Recognition in Indoor Environments
    MOBIQUITOUS
    ACM
    DOI: 10.4108/eai.7-11-2017.2273791
Mohsin Ahmed1,*, Zeya Chen1, Emma Fass1, John Stankovic1
  • 1: University of Virginia
*Contact email: mya5dm@virginia.edu

Abstract

We develop solutions to various challenges in different stages of the processing pipeline of a real time indoor distant speech emotion recognition system to reduce the discrepancy between training and test conditions for distant emotion recognition. We use a novel combination of distorted feature elimination, classifier optimization, several signal cleaning techniques and train classifiers with synthetic reverberation obtained from a room impulse response generator to improve performance in a variety of rooms with various source-to-microphone distances. Our comprehensive evaluation is based on a popular emotional corpus from the literature, two new customized datasets and a dataset made of YouTube videos. The two new datasets are the first ever distance aware emotional corpuses and we created them by 1) injecting room impulse responses collected in a variety of rooms with various source-to-microphone distances into a public emotional corpus; and by 2) re-recording the emotional corpus with microphones placed at different distances. The overall performance results show as much as 15.51% improvement in distant emotion detection over baselines, with a final emotion recognition accuracy ranging between 79.44%-95.89% for different rooms, acoustic configurations and source-to-microphone distances. We experimentally evaluate the CPU time of various system components and demonstrate the real time capability of our system.