Advances in Computer Science and Information Technology. Computer Science and Engineering. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part II

Research Article

Speaker Independent Connected Digit Recognition Using VQ and HMM in Additive Noise Environment

Download
204 downloads
  • @INPROCEEDINGS{10.1007/978-3-642-27308-7_43,
        author={A. Revathi and Y. Venkataramani},
        title={Speaker Independent Connected Digit Recognition Using VQ and HMM in Additive Noise Environment},
        proceedings={Advances in Computer Science and Information Technology. Computer Science and Engineering. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part II},
        proceedings_a={CCSIT PATR II},
        year={2012},
        month={11},
        keywords={Hidden markov model (HMM) Frequency response Speech recognition Vector quantization (VQ) Perceptual linear predictive cepstrum (PLP) Noise Wavelet transform Recursive least square (RLS) filtering},
        doi={10.1007/978-3-642-27308-7_43}
    }
    
  • A. Revathi
    Y. Venkataramani
    Year: 2012
    Speaker Independent Connected Digit Recognition Using VQ and HMM in Additive Noise Environment
    CCSIT PATR II
    Springer
    DOI: 10.1007/978-3-642-27308-7_43
A. Revathi1,*, Y. Venkataramani1,*
  • 1: Saranathan College of Engineering
*Contact email: revathidhanabal@rediffmail.com, principal@saranathan.ac.in

Abstract

The main objective of this paper is to discuss the effectiveness of concatenated perceptual features and the noise reduction technique based on wavelet transform and Recursive least square filtering in getting the good recognition rate for the peculiar combination of connected digits in additive noise environment. The proposed concatenated perceptual features are captured and code book indices are extracted. Expectation maximization algorithm is used to generate discrete HMM models for the connected digits. Speech recognition system is evaluated on clean and noisy test speeches and the selection is based on which model gives maximum log likelihood value. Speeches for this work are randomly chosen from “TI Digits1”, “TI Digits2” databases. This concatenated perceptual feature yields the accuracy of 81.4% and 73% for the combination of connected digits (10 - 19) and (12- 19,21,31,41,51,61,71,81,91). Pink noise, white noise, babble noise and factory noise are considered in this work.