Research Article
Power Micro-Blog Text Classification Based on Domain Dictionary and LSTM-RNN
@INPROCEEDINGS{10.1007/978-3-030-43215-7_3, author={Meng-yao Shen and Jing-sheng Lei and Fei-ye Du and Zhong-qin Bi}, title={Power Micro-Blog Text Classification Based on Domain Dictionary and LSTM-RNN}, proceedings={Testbeds and Research Infrastructures for the Development of Networks and Communications. 14th EAI International Conference, TridentCom 2019, Changsha, China, December 7-8, 2019, Proceedings}, proceedings_a={TRIDENTCOM}, year={2020}, month={3}, keywords={Text classification Power micro-blog Domain dictionary Word vector Classification accuracy LSTM-RNN}, doi={10.1007/978-3-030-43215-7_3} }
- Meng-yao Shen
Jing-sheng Lei
Fei-ye Du
Zhong-qin Bi
Year: 2020
Power Micro-Blog Text Classification Based on Domain Dictionary and LSTM-RNN
TRIDENTCOM
Springer
DOI: 10.1007/978-3-030-43215-7_3
Abstract
The micro-blog texts of the national grid provinces and cities will be analyzed as the main data, including the micro-blogs and corresponding comments, which will help us understand the events of power industry and people’s attitudes towards these events. In this work, the data set is composed of 420,000 micro-blog texts. Firstly, the professional vocabulary of electric power is extracted, and these vocabulary are manually labeled, thus proposing a new field dictionary closely related to the power industry. Secondly, using the new power domain dictionary to classify the 2018 electric micro-blogs, and we can find that classification accuracy increased from 88.7% to 95.2%. Finally, a classification model based on LSTM (Long Short-Term Memory) and RNN (Recurrent Neural Network) is used to deal with the comments under the micro-blog. The experimental result shows that the classification of the LSTM-RNN is more accurate. The rate was 83.1%, which was significantly better than the traditional LSTM and RNN text classification models of 78.4% and 73.1%.