Testbeds and Research Infrastructures for the Development of Networks and Communications. 14th EAI International Conference, TridentCom 2019, Changsha, China, December 7-8, 2019, Proceedings

Research Article

Power Micro-Blog Text Classification Based on Domain Dictionary and LSTM-RNN

Download
116 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-43215-7_3,
        author={Meng-yao Shen and Jing-sheng Lei and Fei-ye Du and Zhong-qin Bi},
        title={Power Micro-Blog Text Classification Based on Domain Dictionary and LSTM-RNN},
        proceedings={Testbeds and Research Infrastructures for the Development of Networks and Communications. 14th EAI International Conference, TridentCom 2019, Changsha, China, December 7-8, 2019, Proceedings},
        proceedings_a={TRIDENTCOM},
        year={2020},
        month={3},
        keywords={Text classification Power micro-blog Domain dictionary Word vector Classification accuracy LSTM-RNN},
        doi={10.1007/978-3-030-43215-7_3}
    }
    
  • Meng-yao Shen
    Jing-sheng Lei
    Fei-ye Du
    Zhong-qin Bi
    Year: 2020
    Power Micro-Blog Text Classification Based on Domain Dictionary and LSTM-RNN
    TRIDENTCOM
    Springer
    DOI: 10.1007/978-3-030-43215-7_3
Meng-yao Shen1, Jing-sheng Lei1, Fei-ye Du1, Zhong-qin Bi1,*
  • 1: Shanghai University of Electric Power
*Contact email: zqbi@shiep.edu.cn

Abstract

The micro-blog texts of the national grid provinces and cities will be analyzed as the main data, including the micro-blogs and corresponding comments, which will help us understand the events of power industry and people’s attitudes towards these events. In this work, the data set is composed of 420,000 micro-blog texts. Firstly, the professional vocabulary of electric power is extracted, and these vocabulary are manually labeled, thus proposing a new field dictionary closely related to the power industry. Secondly, using the new power domain dictionary to classify the 2018 electric micro-blogs, and we can find that classification accuracy increased from 88.7% to 95.2%. Finally, a classification model based on LSTM (Long Short-Term Memory) and RNN (Recurrent Neural Network) is used to deal with the comments under the micro-blog. The experimental result shows that the classification of the LSTM-RNN is more accurate. The rate was 83.1%, which was significantly better than the traditional LSTM and RNN text classification models of 78.4% and 73.1%.