Research Article
How’s my Mood and Stress? An Efficient Speech Analysis Library for Unobtrusive Monitoring on Mobile Phones
@INPROCEEDINGS{10.4108/icst.bodynets.2011.247079, author={Keng-hao Chang and Drew Fisher and John Canny and Bjoern Hartmann}, title={How’s my Mood and Stress? An Efficient Speech Analysis Library for Unobtrusive Monitoring on Mobile Phones}, proceedings={6th International ICST Conference on Body Area Networks}, publisher={ICST}, proceedings_a={BODYNETS}, year={2012}, month={6}, keywords={health care mental health monitor mobile phones voice analysis toolkit}, doi={10.4108/icst.bodynets.2011.247079} }
- Keng-hao Chang
Drew Fisher
John Canny
Bjoern Hartmann
Year: 2012
How’s my Mood and Stress? An Efficient Speech Analysis Library for Unobtrusive Monitoring on Mobile Phones
BODYNETS
ICST
DOI: 10.4108/icst.bodynets.2011.247079
Abstract
The human voice encodes a wealth of information about emotion, mood, stress, and mental state. With mobile phones (one of the mostly used modules in body area networks) this information is potentially available to a host of applications and can enable richer, more appropriate, and more satisfying human-computer interaction. In this paper we describe the AMMON (Affective and Mental health MONitor) library, a low footprint C library designed for widely available phones as an enabler of these applications. The library incorporates both core features for emotion recognition (from the Interspeech 2009 Emotion recognition challenge), and the most important features for mental health analysis (glottal timing features). To comfortably run the library on feature phones (the most widely-used class of phones today), we implemented the routines in fixed-point arithmetic, and minimized computational and memory footprint. On identical test data, emotion and stress classification accuracy was indistinguishable from a state-of-the-art reference system running on a PC, achieving 75% accuracy on two-class emotion classification tasks and 84% accuracy on binary classification of stressed and neutral situations. The library uses 30% of real-time on a 1GHz processor during emotion recognition and 70% during stress and mental health analysis.