IoT as a Service. Third International Conference, IoTaaS 2017, Taichung, Taiwan, September 20–22, 2017, Proceedings

Research Article

An Optimized Implementation of Speech Recognition Combining GPU with Deep Belief Network for IoT

Download
152 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-00410-1_30,
        author={Weipeng Jing and Tao Jiang and Mithun Mukherjee and Lei Shu and Jian Kang},
        title={An Optimized Implementation of Speech Recognition Combining GPU with Deep Belief Network for IoT},
        proceedings={IoT as a Service. Third International Conference, IoTaaS 2017, Taichung, Taiwan, September 20--22, 2017, Proceedings},
        proceedings_a={IOTAAS},
        year={2018},
        month={10},
        keywords={IoT Speech recognition DBN GPU Parallel computation Mobile computing},
        doi={10.1007/978-3-030-00410-1_30}
    }
    
  • Weipeng Jing
    Tao Jiang
    Mithun Mukherjee
    Lei Shu
    Jian Kang
    Year: 2018
    An Optimized Implementation of Speech Recognition Combining GPU with Deep Belief Network for IoT
    IOTAAS
    Springer
    DOI: 10.1007/978-3-030-00410-1_30
Weipeng Jing,*, Tao Jiang1,*, Mithun Mukherjee2,*, Lei Shu,*, Jian Kang1,*
  • 1: Northeast Forestry University
  • 2: Guangdong University of Petrochemical Technology
*Contact email: weipeng.jing@outlook.com, taojiang920619@outlook.com, m.mukherjee@ieee.org, lei.shu@ieee.org, laurelkang@outlook.com

Abstract

With the advancement in Internet of Things (Iot), the speech recognition technology in mobile terminals’ applications has become a new trend. Consequently, how to accelerate the training and improve the accuracy in speech recognition has attracted the attention of academia and industry. Generally, Deep Belief Network (DBN) with Graphic Processing Unit (GPU) is applied in acoustic model of speech recognition, critical research challenges are yet to be solved. It’s hard for GPU to store the parameters of DBN at one time as well as GPU’s shared memory is not fully used. And parameters transmission have become a bottleneck in multi-GPUs. This paper presents a new method in which the weight matrix is divided into sub-weight matrices and established a reasonable memory model. To eliminate the inefficient idle-state during data transfers, a stream process model is proposed in which the data transfer and kernel execution are performed simultaneously. Further, apply the optimized single GPU implementation to multi-GPUs and is intend to solve the parameters transmission. Experimental results show the optimized GPU implementation without violating the size limitation of GPU’s memory.