Advanced Hybrid Information Processing. Second EAI International Conference, ADHIP 2018, Yiyang, China, October 5-6, 2018, Proceedings

Research Article

A News Text Clustering Method Based on Similarity of Text Labels

Download
212 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-19086-6_55,
        author={Yuqiang Tong and Lize Gu},
        title={A News Text Clustering Method Based on Similarity of Text Labels},
        proceedings={Advanced Hybrid Information Processing. Second EAI International Conference, ADHIP 2018, Yiyang, China, October 5-6, 2018, Proceedings},
        proceedings_a={ADHIP},
        year={2019},
        month={5},
        keywords={Data clustering MinHash Hierarchical clustering},
        doi={10.1007/978-3-030-19086-6_55}
    }
    
  • Yuqiang Tong
    Lize Gu
    Year: 2019
    A News Text Clustering Method Based on Similarity of Text Labels
    ADHIP
    Springer
    DOI: 10.1007/978-3-030-19086-6_55
Yuqiang Tong1,*, Lize Gu1,*
  • 1: Beijing University of Posts and Telecommunications
*Contact email: 1172183723@qq.com, glzisc@bupt.edu.cn

Abstract

As an important text type, news texts have great research value in data mining, Such as hotspot tracking, public opinion analysis and other fields. News text clustering is a common method for studying the trend of news and hotspot tracking. Most of the existing clustering methods are based on the vector space model, with calculating the TF-IDF of words in the news text as feature items of the text. To improve the performance of clustering in the news texts, this paper presents a new clustering algorithm, this algorithm expresses the news text as a series of Text labels, which effectively solves the problem that the data latitude is too high, and the clusters is too hard to express. At the same time, by using a conceptual clustering algorithm, this method effectively reduces the number of comparisons. The experimental results show that the algorithm based on similarity of text labels improves the quality of clustering compared to traditional clustering methods.