
Research Article
Short Text Data Mining Based on Incremental AP Clustering
@INPROCEEDINGS{10.1007/978-3-031-65126-7_31, author={Fuyu Lu and Ying Guo and Peiyi Qu and Yonglin Leng}, title={Short Text Data Mining Based on Incremental AP Clustering}, proceedings={Quality, Reliability, Security and Robustness in Heterogeneous Systems. 19th EAI International Conference, QShine 2023, Shenzhen, China, October 8 -- 9, 2023, Proceedings, Part I}, proceedings_a={QSHINE}, year={2024}, month={8}, keywords={Short Text Vector Representation Model Incremental AP Clustering}, doi={10.1007/978-3-031-65126-7_31} }
- Fuyu Lu
Ying Guo
Peiyi Qu
Yonglin Leng
Year: 2024
Short Text Data Mining Based on Incremental AP Clustering
QSHINE
Springer
DOI: 10.1007/978-3-031-65126-7_31
Abstract
The rapid development of mobile internet technology generates many short text data, which contains many hot topics. By clustering short text data, we can identify many hot topics in time. This information is crucial for discovering public opinion and analyzing user emotions. This paper proposes a hybrid vector representation model (HVRM) that combines weight and topic features to address the feature information loss caused by a single short text vector representation model and short text sparsity. Firstly, HVRM mines the local features using Word2Vec and TF-IDF to get the weighted vector of short text. Next, use BTM to obtain global feature vectors. And then connect the two feature vectors to form short text vectors. Finally, we use KNN to initialize the responsibility and availability matrices of incremental AP clustering (IAPC). The experimental results show that the hybrid vector representation model proposed in this paper can effectively improve the clustering effect.