2nd International ICST Conference on Communications and Networking in China

Research Article

A Natural Chinese speech Driven Mouth Animation System

  • @INPROCEEDINGS{10.1109/CHINACOM.2007.4469473,
        author={Ming Xu and Jianjun Ouyang and Yunsen Huang},
        title={A Natural Chinese speech Driven Mouth Animation System},
        proceedings={2nd International ICST Conference on Communications and Networking in China},
        publisher={IEEE},
        proceedings_a={CHINACOM},
        year={2008},
        month={3},
        keywords={MPEG-4 FAPs  Speech driven  mouth animation  triseme  viseme},
        doi={10.1109/CHINACOM.2007.4469473}
    }
    
  • Ming Xu
    Jianjun Ouyang
    Yunsen Huang
    Year: 2008
    A Natural Chinese speech Driven Mouth Animation System
    CHINACOM
    IEEE
    DOI: 10.1109/CHINACOM.2007.4469473
Ming Xu1,*, Jianjun Ouyang2,*, Yunsen Huang1,*
  • 1: Information Center Shenzhen University, Shenzhen, China
  • 2: College of Information Engineering Shenzhen University, Shenzhen, China
*Contact email: xuming@szu.edu.cn, oudyyang@163.com, huangys@szu.edu.cn

Abstract

Distinguish with phoneme based human mouth animation researches, this paper presents a novel natural speech driven mouth animation approach. To recognize the mouth shapes sequence from continuous speech, the context-dependent viseme (triseme) modeling technique is employed for acquiring the trisemic HMMs. To obtain the robust model parameters with the limited training data, the states tying procedure is introduced. Considering the compatibility and ambiguity issues, the visemic questions which assigned in the leaf nodes of decision tree are generated that based on the training data. With the modeled HMM parameters, the viterbi beam searching algorithm is applied to time align the trisemic sequences. Mapping the recognized trisemes to the corresponding MPEG-4 FAPs represented mouth shapes, the speaking mouth can be finally animated through a smoothing process. The experimental results demonstrate that the recognition accuracy is applicable and also the recognizing and aligning speed is acceptable in human vision range.