1st International ICST Conference on Ambient Media and Systems

Research Article

Improving Dialogue Systems in a Home Automation Environment

Download555 downloads
  • @INPROCEEDINGS{10.4108/ICST.AMBISYS2008.2878,
        author={Raquel Justo and Oscar Saz and V\^{\i}ctor Guijarrubia and Antonio Miguel and M. In\^{e}s Torres and Eduardo Lleida},
        title={Improving Dialogue Systems in a Home Automation Environment},
        proceedings={1st International ICST Conference on Ambient Media and Systems},
        publisher={ICST},
        proceedings_a={AMBI-SYS},
        year={2010},
        month={5},
        keywords={Design Performance},
        doi={10.4108/ICST.AMBISYS2008.2878}
    }
    
  • Raquel Justo
    Oscar Saz
    Víctor Guijarrubia
    Antonio Miguel
    M. Inés Torres
    Eduardo Lleida
    Year: 2010
    Improving Dialogue Systems in a Home Automation Environment
    AMBI-SYS
    ICST
    DOI: 10.4108/ICST.AMBISYS2008.2878
Raquel Justo1,*, Oscar Saz2,*, Víctor Guijarrubia1,*, Antonio Miguel2,*, M. Inés Torres1,*, Eduardo Lleida2,*
  • 1: University of the Basque Country 48940-Leioa, Spain
  • 2: University of Zaragoza Zaragoza, Spain
*Contact email: raquel.justo@ehu.es, oskarsaz@unizar.es, vgga@we.lc.ehu.es, amiguel@unizar.es, manes.torres@ehu.es, lleida@unizar.es

Abstract

In this paper, a task of human-machine interaction based on speech is presented. The specific task consists on the use and control of a set of home appliances through a turnbased dialogue system. This work focuses on the first part of the dialogue system, the Automatic Speech Recognition (ASR) system. Two lines of work are taken into account to improve the performance of the ASR system. On one hand, the acoustic modeling required for the ASR is improved via Speaker Adaptation techniques. On the other hand, the Language Modeling in the system is improved by the use of class-based Language Models. The results show the good performance of both techniques to improve the ASR results, as the Word Error Rate (WER) drops from 5.81% using a close-talk microphone to a 0.99% and from 14.53% using a lapel microphone to a 1.52%. Also, an important reduction is achieved in terms of the Category Error Rate (CER), which measures the ability of the ASR system to extract the semantic information of the uttered sentence, dropping from 6.13% and 15.32% to 1.29% and 1.32% for the two microphones used in the experiments.