
Research Article
Personal Name Disambiguation for Chinese Documents in Online Medium
@INPROCEEDINGS{10.1007/978-3-030-82562-1_23, author={Chao Fan and Yu Li}, title={Personal Name Disambiguation for Chinese Documents in Online Medium}, proceedings={Multimedia Technology and Enhanced Learning. Third EAI International Conference, ICMTEL 2021, Virtual Event, April 8--9, 2021, Proceedings, Part I}, proceedings_a={ICMTEL}, year={2021}, month={7}, keywords={Personal name disambiguation Chinese personal names Agglomerative clustering}, doi={10.1007/978-3-030-82562-1_23} }
- Chao Fan
Yu Li
Year: 2021
Personal Name Disambiguation for Chinese Documents in Online Medium
ICMTEL
Springer
DOI: 10.1007/978-3-030-82562-1_23
Abstract
Disambiguating various people that share the same name is a critical issue for analyzing contents in online medium. This paper develops a framework for dealing with personal names in Chinese dataset. Web pages containing personal name are crawled from the online website and standardized at first. Then documents are parsed with lexical analysis technologies, such as segmentation, part-of-speech tagging, named entity recognition. We extract several groups of words as features, testing different weighting schemes (e.g. Boolean term frequency, absolute term frequency, tf-idf, entropy weights). By conducting the agglomerative clustering, a measure of interdependence within clusters and independence between clusters is proposed for automatically determining the number of clusters. Moreover, a technique that merges noise clusters is utilized to improve the clustering results. Experiments are performed on six groups of Chinese personal names and the final results confirm our proposed approach.